## Problem 1 - Basic statistics of the data (2 points)

In this problem your task is to open and explore an NOAA weather data file using pandas. The data file name is `6153237444115dat.csv` and it is located in the `data` folder in this repository. 

### Tips for completing this problem

- Use **exactly** the same variable names as in the instructions because your answers will be automatically graded, and the tests that grade your answers rely on following the same formatting or variable naming as in the instructions.
- **Please do not**:


**Your score on this problem will be based on following criteria:**

- Importing the pandas module
- Reading the data using pandas into a variable called `data`
- Calculating a number of values from the data file used in this problem
- Including comments that explain what most lines in the code do
- Answering a couple questions at the end of the problem
- Uploading your notebook to your GitHub repository for this week's exercise

### Part 1 

Your first task is to import pandas and read in the data file.

- Import the pandas module
- Read the data into a variable called `data` using pandas 

**Hint**: When reading the data, you need to consider a couple of things:

- The input file is located in the `data` folder, and you need to include this in the filepath when reading the file to pandas
- No-data values are specified with varying number of `*` characters in the input file. You can tell this to pandas by specifying the following parameter in the pandas `read_csv()` function: `na_values=['*', '**', '***', '****', '*****', '******']`     

In [1]:
# Pandas imported as pd
import pandas as pd

# Reading data and replacing *s with NaN
data = pd.read_csv("data/6153237444115dat.csv", na_values = ["*", "**", "***", "****", "*****", "******"])

# YOUR CODE HERE

In [2]:
data.head()

Unnamed: 0,USAF,WBAN,YR--MODAHRMN,DIR,SPD,GUS,CLG,SKC,L,M,...,SLP,ALT,STP,MAX,MIN,PCP01,PCP06,PCP24,PCPXX,SD
0,28450,99999,201705010000,174.0,10.0,14.0,,,,,...,1009.2,,984.1,,,,,,,35.0
1,28450,99999,201705010020,180.0,10.0,,4.0,,,,...,,29.74,,,,,,,,
2,28450,99999,201705010050,190.0,10.0,,4.0,,,,...,,29.74,,,,,,,,
3,28450,99999,201705010100,188.0,12.0,16.0,,,,,...,1009.1,,984.0,,,,,,,35.0
4,28450,99999,201705010120,200.0,13.0,,2.0,OBS,,,...,,29.74,,,,,,,,


### Part 2 

In the cell below, fill in the variables values to print answers to these questions:

- How many rows is there in the data (variable `rows`)?
- What are the column names (variable `column_names`)?
- What are the datatypes of the columns (variable `column_datatypes`)?

In [3]:
# Calculating rows and columns count and the type of all columns
row_count = len(data)
column_names = len(data.columns)
column_datatypes = data.dtypes

# YOUR CODE HERE

In [4]:
# Print the number of rows in the dataframe:
print(f"There are {row_count} rows")

There are 11694 rows


In [5]:
# Print the column names:
print(f"The columns are: \n{column_names}")

The columns are: 
33


In [6]:
# Print the column datatypes:
print(f"The column types are: \n{column_datatypes}")

The column types are: 
USAF              int64
WBAN              int64
YR--MODAHRMN      int64
DIR             float64
SPD             float64
GUS             float64
CLG             float64
SKC              object
L               float64
M               float64
H               float64
VSB             float64
MW              float64
MW.1            float64
MW.2            float64
MW.3            float64
AW              float64
AW.1            float64
AW.2            float64
AW.3            float64
W               float64
TEMP            float64
DEWP            float64
SLP             float64
ALT             float64
STP             float64
MAX             float64
MIN             float64
PCP01           float64
PCP06           float64
PCP24           float64
PCPXX           float64
SD              float64
dtype: object


### Part 3

In the cell below, fill in the variables values to print answers to these questions:

- What is the mean Fahrenheit temperature in the data (`temp_mean` variable calculated from the `TEMP` column)?
- What is the standard deviation of the maximum temperature (`temp_max_std` variable calculated from the `MAX` column)?
- How many unique stations exists in the data (`station_count` variable calcualted from the `USAF` column)?

In [7]:
# Calculate mean temperature
temp_mean = data["TEMP"].mean()

# Calculate standard deviaton of the maximum temperature:
temp_max_std = data["MAX"].std()

#Calculate number of unique stations:
station_count = data["USAF"].nunique()

# YOUR CODE HERE

In [8]:
# Check mean temperature value
print(f"The mean temperature in Fahrenheit is {round(temp_mean,1)}")

The mean temperature in Fahrenheit is 52.2


In [9]:
# Check standard deviation value
print(f"The standard deviation of maximum temperature is {round(temp_max_std, 1)}")

The standard deviation of maximum temperature is 10.3


In [10]:
# Check number of stations value
print(f"The number of unique stations is {station_count}")

The number of unique stations is 2


## Problem 2 - Data manipulation and selection 

In this problem you will clean the data from our data file by removing no-data values, convert temperature values in Fahrenheit to Celsius, and split the data into separate datasets using the weather station identification code. We will start this problem by cleaning and converting our temperature data. Please perform the tasks below by writing your code into the code cells in each section.

### Tips for completing this problem

- Use **exactly** the same variable names as in the instructions because your answers will be automatically graded, and the tests that grade your answers rely on following the same formatting or variable naming as in the instructions.
- **Please do not**:



**Your score on this problem will be based on following criteria:**

- Creating a new dataframe called `selected` that contains select columns from the data file
- Cleaning the new dataframe by removing no-data values
- Creating a new column for temperatures converted from Fahrenheit to Celsius
- Dividing the data into separate dataframes for the Helsinki Kumpula and Rovaniemi stations
- Saving the new dataframes to CSV files
- Including comments that explain what most lines in the code do
- Answering a couple questions at the end of the problem
- Uploading your notebook and data files to your GitHub repository for this week's exercise

### Part 1

The first step for this problem is to read the data file `6153237444115dat.csv` again into a variable `data` using pandas. Remember to specify the no-data values (you can copy your code from Problem 1).

In [11]:
# Reading data and replacing *s with NaN
data = pd.read_csv("data/6153237444115dat.csv", na_values = ["*", "**", "***", "****", "*****", "******"])
# YOUR CODE HERE

Check that the first rows of the DataFrame look ok:

In [12]:
data.head()

Unnamed: 0,USAF,WBAN,YR--MODAHRMN,DIR,SPD,GUS,CLG,SKC,L,M,...,SLP,ALT,STP,MAX,MIN,PCP01,PCP06,PCP24,PCPXX,SD
0,28450,99999,201705010000,174.0,10.0,14.0,,,,,...,1009.2,,984.1,,,,,,,35.0
1,28450,99999,201705010020,180.0,10.0,,4.0,,,,...,,29.74,,,,,,,,
2,28450,99999,201705010050,190.0,10.0,,4.0,,,,...,,29.74,,,,,,,,
3,28450,99999,201705010100,188.0,12.0,16.0,,,,,...,1009.1,,984.0,,,,,,,35.0
4,28450,99999,201705010120,200.0,13.0,,2.0,OBS,,,...,,29.74,,,,,,,,


Check the number of rows in the DataFrame:

In [13]:
print(f' There are {len(data)} rows in the DataFrame')

 There are 11694 rows in the DataFrame


### Part 2 

Next, your task is to subset the data and remove rows with missing temperature values.

- Select the columns `USAF`, `YR--MODAHRMN`, `TEMP`, `MAX`, and `MIN` from the `data` dataframe and assign them to the variable `selected`
- Remove all rows from `selected` that have NoData in the column `TEMP` using the `dropna()` function

In [14]:
# Selecting some of the columns from the DataFrame
selected = data[["USAF", "YR--MODAHRMN", "TEMP", "MAX", "MIN"]]

# Removing the NaN values from the Temp column
selected = selected.dropna(subset = ["TEMP"])

Check that you selected the correct column names:

In [15]:
selected.head()

Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN
0,28450,201705010000,31.0,,
1,28450,201705010020,30.0,,
2,28450,201705010050,30.0,,
3,28450,201705010100,31.0,,
4,28450,201705010120,30.0,,


Check how many rows you have after removing the no-data values:

In [16]:
len(selected)

11691

### Part 3 

Next, you can convert the temperature values in Fahrenheit to Celsius.

- Create a new column in `selected` called `Celsius`
- Convert the Fahrenheit temperatures from `TEMP` using the conversion formula below and store the results in the new `Celsius` column.

$$
\Large
\begin{equation}
  T_{\mathrm{Celsius}} = (T_{\mathrm{Fahrenheit}} - 32)~/~1.8
\end{equation}
$$

- Round the values in the `Celsius` column to have 0 decimals (**do not** create a new column, update the current one)
- Convert the `Celsius` values into integers (**do not** create a new column, update the current one)

In [17]:
# Defining a function that converts Fahrenheit To Celsius
def fahr_to_celsius(Temp):
    return(Temp - 32) / 1.8
    
# Creating a new column in selected DataFrame and fill with the converted data
selected["Celsius"] = selected["TEMP"].apply(fahr_to_celsius)

In [18]:
# Checking your dataframe
selected.head()

Unnamed: 0,USAF,YR--MODAHRMN,TEMP,MAX,MIN,Celsius
0,28450,201705010000,31.0,,,-0.555556
1,28450,201705010020,30.0,,,-1.111111
2,28450,201705010050,30.0,,,-1.111111
3,28450,201705010100,31.0,,,-0.555556
4,28450,201705010120,30.0,,,-1.111111


In [19]:
# Check data types
selected.dtypes

USAF              int64
YR--MODAHRMN      int64
TEMP            float64
MAX             float64
MIN             float64
Celsius         float64
dtype: object

### Part 4 

Your next task is to divide `selected` into two separate dataframes. Please use the given variable names and write your answer in the code cell below.

- Select all rows from the `selected` DataFrame with the `USAF` code `29980` and store them in a variable called `kumpula`
- Select all rows from the `selected` DataFrame with the `USAF` code `28450` and store them in a variable called `rovaniemi`

In [20]:
# dividing selected dataframe into two separate dataframes 
kumpula = selected[selected["USAF"] == 29980]
rovaniemi = selected[selected["USAF"] == 28450]

# YOUR CODE HERE

In [21]:
# Check the dataframe
print(f"Kumpula: \n{kumpula.head()}\n")

Kumpula: 
       USAF  YR--MODAHRMN  TEMP  MAX  MIN   Celsius
8770  29980  201705010000  37.0  NaN  NaN  2.777778
8771  29980  201705010100  37.0  NaN  NaN  2.777778
8772  29980  201705010200  37.0  NaN  NaN  2.777778
8773  29980  201705010300  37.0  NaN  NaN  2.777778
8774  29980  201705010400  39.0  NaN  NaN  3.888889



In [22]:
# Check the dataframe
print(f"Rovaniemi: \n{rovaniemi.head()}\n")

Rovaniemi: 
    USAF  YR--MODAHRMN  TEMP  MAX  MIN   Celsius
0  28450  201705010000  31.0  NaN  NaN -0.555556
1  28450  201705010020  30.0  NaN  NaN -1.111111
2  28450  201705010050  30.0  NaN  NaN -1.111111
3  28450  201705010100  31.0  NaN  NaN -0.555556
4  28450  201705010120  30.0  NaN  NaN -1.111111



### Part 5 

Now you can save your selections to csv files.

- Save the `kumpula` DataFrame in the file `Kumpula_temps_May_Aug_2017.csv` (CSV format)
- Save the `rovaniemi` DataFrame in the file `Rovaniemi_temps_May_Aug_2017.csv` (CSV format)

For each file, be sure to 

- Separate the columns with commas (`,`)
- Use only 2 decimals for the floating point numbers



In [23]:
# Defining the direcation to save two dataframes
output_k = r'F:\Learning\Python-SpatialAanalysis\4-Assignments\pandas-and-matplotlib-HamedAhmadi89\data\Kumpula_temps_May_Aug_2017.csv'
output_r = r'F:\Learning\Python-SpatialAanalysis\4-Assignments\pandas-and-matplotlib-HamedAhmadi89\data\Rovaniemi_temps_May_Aug_2017.csv'

# Saving the dataframes
kumpula.to_csv(output_k, sep=',', float_format='%.2f', index = False)
rovaniemi.to_csv(output_r, sep=',', float_format='%.2f', index = False)

In [24]:
#Read-only cell for hidden tests :)

### Problem 2 summary

- Was anything unclear to you in Problem 2?
- Did you encounter any problems with decimal formatting?

 #### Answers: 
 - No
 - Yes. At first, I didn't know how to use 2 decimals for the floating data; Then I searched and found out. 
Also, make sure you:

- Check that your code includes informative comments explaining what your code does
- Commit and push your changes to your GitHub repository for Exercise 5 (including your 2 new data files)



## Problem 3 - Data analysis 

In this problem we will explore our temperature data by comparing spring temperatures between Helsinki Kumpula and Rovaniemi. To do this we'll use some conditions to extract subsets of our data and then analyse these subsets using basic pandas functions. Please perform the tasks below by writing your code into the code cells in each section.

### Tips for completing this problem

- Use **exactly** the same variable names as in the instructions because your answers will be automatically graded, and the tests that grade your answers rely on following the same formatting or variable naming as in the instructions.
- **Please do not**:

   


**Your score on this problem will be based on following criteria:**

- Calculating the median temperatures for Helsinki Kumpula and Rovaniemi for the summer of 2017
- Selecting temperatures for May and June 2017 in separate dataframes for each location
- Printing out some summary values for each month (May, June) and location (Kumpula, Rovaniemi)
- Including comments that explain what most lines in the code do
- Answering a couple questions at the end of the problem
- Uploading your notebook and data files to your GitHub repository for this week's exercise

### Part 1 

First, you need to load the data from Problem 2.

- Read in the csv files generated in Problem 2 to the variables `kumpula` and `rovaniemi`

In [25]:
# Reading DataFrames for kumpula and rovaniemi
kumpula = pd.read_csv("data/Kumpula_temps_May_Aug_2017.csv")
rovaniemi = pd.read_csv("data/Rovaniemi_temps_May_Aug_2017.csv")

In [26]:
# Printing DataFrames
print(kumpula.head())
print("")
print(rovaniemi.head())

    USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  29980  201705010000  37.0  NaN  NaN     2.78
1  29980  201705010100  37.0  NaN  NaN     2.78
2  29980  201705010200  37.0  NaN  NaN     2.78
3  29980  201705010300  37.0  NaN  NaN     2.78
4  29980  201705010400  39.0  NaN  NaN     3.89

    USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  28450  201705010000  31.0  NaN  NaN    -0.56
1  28450  201705010020  30.0  NaN  NaN    -1.11
2  28450  201705010050  30.0  NaN  NaN    -1.11
3  28450  201705010100  31.0  NaN  NaN    -0.56
4  28450  201705010120  30.0  NaN  NaN    -1.11


### Part 2 

Next you can find the *median temperatures* for the period the data covers.

- What was the median Celsius temperature during the observed period in:
    - Helsinki Kumpula? (store the answer in a variable `kumpula_median`)
    - Rovaniemi? (store the answer in a variable `rovaniemi_median`)

In [27]:
# Calculating medians of Celsius for the tow stations
kumpula_median = kumpula["Celsius"].median()
rovaniemi_median = rovaniemi["Celsius"].median()

In [28]:
# Prints the median temperatures
print(f"Kumpula median: {kumpula_median}")
print(f"Rovaniemi median: {rovaniemi_median}")

Kumpula median: 14.44
Rovaniemi median: 11.11


### Part 3 

The median temperatures above consider data from the entire summer (May-Aug), hence the differences might not be so clear. Let's now find the *mean temperatures* from May and June 2017 in Kumpula and Rovaniemi.

- From the `kumpula` and `rovaniemi` DataFrames, select the rows where values of the `YR--MODAHRMN` column are from May 2017
    - Assign these selected rows to the variables `kumpula_may` and `rovaniemi_may` (you can check the 
- Repeat the procedure for the month of June and assign those values to variables to `kumpula_june` and `rovaniemi_june`

In [29]:
# Select the subset of the Kumpula and Rovaniemi data for the 5th and 6th month

# Selecting May data for kumpula 
kumpula_may = kumpula[(kumpula["YR--MODAHRMN"] >= 201705010000) & (kumpula["YR--MODAHRMN"] < 201706010000)]

# Selecting May data for rovaniemi 
rovaniemi_may = rovaniemi[(rovaniemi["YR--MODAHRMN"] >= 201705010000) & (rovaniemi["YR--MODAHRMN"] < 201706010000)]

# Selecting June data for kumpula 
kumpula_june = kumpula[(kumpula["YR--MODAHRMN"] >= 201706010000) & (kumpula["YR--MODAHRMN"] < 201707010000)]

# Selecting June data for rovaniemi 
rovaniemi_june = rovaniemi[(rovaniemi["YR--MODAHRMN"] >= 201706010000) & (rovaniemi["YR--MODAHRMN"] < 201707010000)]

In [30]:
print(f"First values in May, Kumpula:\n{kumpula_may.head()}\n")
print(f"Last values in May, Kumpula:\n{kumpula_may.tail()}")

First values in May, Kumpula:
    USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  29980  201705010000  37.0  NaN  NaN     2.78
1  29980  201705010100  37.0  NaN  NaN     2.78
2  29980  201705010200  37.0  NaN  NaN     2.78
3  29980  201705010300  37.0  NaN  NaN     2.78
4  29980  201705010400  39.0  NaN  NaN     3.89

Last values in May, Kumpula:
      USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
736  29980  201705311900  51.0  NaN  NaN    10.56
737  29980  201705312000  50.0  NaN  NaN    10.00
738  29980  201705312100  47.0  NaN  NaN     8.33
739  29980  201705312200  44.0  NaN  NaN     6.67
740  29980  201705312300  43.0  NaN  NaN     6.11


In [31]:
print(f"First values in June, Kumpula:\n{kumpula_june.head()}\n")
print(f"Last values in June, Kumpula:\n{kumpula_june.tail()}")

First values in June, Kumpula:
      USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
741  29980  201706010000  42.0  NaN  NaN     5.56
742  29980  201706010100  40.0  NaN  NaN     4.44
743  29980  201706010200  40.0  NaN  NaN     4.44
744  29980  201706010300  41.0  NaN  NaN     5.00
745  29980  201706010400  44.0  NaN  NaN     6.67

Last values in June, Kumpula:
       USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
1450  29980  201706301900  65.0  NaN  NaN    18.33
1451  29980  201706302000  61.0  NaN  NaN    16.11
1452  29980  201706302100  63.0  NaN  NaN    17.22
1453  29980  201706302200  62.0  NaN  NaN    16.67
1454  29980  201706302300  61.0  NaN  NaN    16.11


In [32]:
print(f"First values in May, Rovaniemi:\n{rovaniemi_may.head()}\n")
print(f"Last values in May, Rovaniemi:\n{rovaniemi_may.tail()}")

First values in May, Rovaniemi:
    USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  28450  201705010000  31.0  NaN  NaN    -0.56
1  28450  201705010020  30.0  NaN  NaN    -1.11
2  28450  201705010050  30.0  NaN  NaN    -1.11
3  28450  201705010100  31.0  NaN  NaN    -0.56
4  28450  201705010120  30.0  NaN  NaN    -1.11

Last values in May, Rovaniemi:
       USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
2215  28450  201705312220  32.0  NaN  NaN     0.00
2216  28450  201705312250  32.0  NaN  NaN     0.00
2217  28450  201705312300  33.0  NaN  NaN     0.56
2218  28450  201705312320  32.0  NaN  NaN     0.00
2219  28450  201705312350  30.0  NaN  NaN    -1.11


In [33]:
print(f"First values in June, Rovaniemi:\n{rovaniemi_june.head()}\n")
print(f"Last values in June, Rovaniemi:\n{rovaniemi_june.tail()}")

First values in June, Rovaniemi:
       USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
2220  28450  201706010000  32.0  NaN  NaN     0.00
2221  28450  201706010020  30.0  NaN  NaN    -1.11
2222  28450  201706010050  30.0  NaN  NaN    -1.11
2223  28450  201706010100  31.0  NaN  NaN    -0.56
2224  28450  201706010120  30.0  NaN  NaN    -1.11

Last values in June, Rovaniemi:
       USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
4342  28450  201706302220  57.0  NaN  NaN    13.89
4343  28450  201706302250  57.0  NaN  NaN    13.89
4344  28450  201706302300  56.0  NaN  NaN    13.33
4345  28450  201706302320  57.0  NaN  NaN    13.89
4346  28450  201706302350  57.0  NaN  NaN    13.89


### Part 4

Now you can make your temperature data from both locations and months easier to compare by printing out a few useful values.

- Use the `print()` function to show the mean, min and max Celsius temperatures for both places in May and June using the new subset dataframes (`kumpula_may`, `rovaniemi_may`, `kumpula_june`, and `rovaniemi_june`).

In [34]:
# Calculating mean, min and max for kumpula in May
kumpula_may_mean = kumpula_may["Celsius"].mean()
kumpula_may_min = kumpula_may["Celsius"].min()
kumpula_may_max = kumpula_may["Celsius"].max()

# Calculating mean, min and max for rovaniemi in May
rovaniemi_may_mean = rovaniemi_may["Celsius"].mean()
rovaniemi_may_min = rovaniemi_may["Celsius"].min()
rovaniemi_may_max = rovaniemi_may["Celsius"].max()

# Calculating mean, min and max for kumpula in June
kumpula_june_mean = kumpula_june["Celsius"].mean()
kumpula_june_min = kumpula_june["Celsius"].min()
kumpula_june_max = kumpula_june["Celsius"].max()

# Calculating mean, min and max for rovaniemi in June
rovaniemi_june_mean = rovaniemi_june["Celsius"].mean()
rovaniemi_june_min = rovaniemi_june["Celsius"].min()
rovaniemi_june_max = rovaniemi_june["Celsius"].max()

# Printing the results
print(f'The Mean, Min and Max for kumpula in May are {kumpula_may_mean:.2f}, {kumpula_may_min} & {kumpula_may_max} respectively')
print(f'The Mean, Min and Max for rovaniemi in May are {rovaniemi_may_mean:.2f}, {rovaniemi_may_min} & {rovaniemi_may_max} respectively')
print(f'The Mean, Min and Max for kumpula in June are {kumpula_june_mean:.2f}, {kumpula_june_min} & {kumpula_june_max} respectively')
print(f'The Mean, Min and Max for rovaniemi in June are {rovaniemi_june_mean:.2f}, {rovaniemi_june_min} & {rovaniemi_june_max} respectively')

The Mean, Min and Max for kumpula in May are 9.76, -2.22 & 22.78 respectively
The Mean, Min and Max for rovaniemi in May are 2.99, -7.22 & 15.0 respectively
The Mean, Min and Max for kumpula in June are 13.74, 2.78 & 23.89 respectively
The Mean, Min and Max for rovaniemi in June are 11.02, -1.11 & 23.33 respectively


## Problem 4 (*optional*) - Parsing daily temperatures

**This is an optional task for those who want more practice.**

This problem is more challenging as we provide only minimal instructions for completing the given tasks. You will need to search through the pandas documentation (and other resources) for help. We will cover data aggregation in more detail during Lesson 6, so this is a good opportunity to get a head start for next week!

In this problem, the aim is to aggregate the hourly temperature data for Helsinki Kumpula and Rovaniemi weather stations to the daily level. Currently, there are (at most) 3 measurements per hour in the data as you can see from the `YR--MODAHRMN` column (Year-Month-Day-Hour-Minute in Greenwich Mean Time (GMT):

```
    USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  28450  201705010000  31.0  NaN  NaN       -1
1  28450  201705010020  30.0  NaN  NaN       -1
2  28450  201705010050  30.0  NaN  NaN       -1
3  28450  201705010100  31.0  NaN  NaN       -1
4  28450  201705010120  30.0  NaN  NaN       -1
```

The output should contain mean, max, and min Celsius temperatures for each day (for example, one mean temperature value for the 1st of May and so on).

### What to do

- Your task is to summarize the information for each day by aggregating (grouping) the DataFrame.
- The output should be a new DataFrame where you have calculated the mean, max, and min Celsius temperatures for each day separately based on hourly values.
- Repeat the task for the two data sets you created in Problem 2 (May-August temperatures from Rovaniemi and Kumpula).

Don't forget to:

- Include useful comments in your code
- Push your solution to GitHub



In [61]:
# Convert the column type to string in a new column
kumpula_may["YR--MODAHRMN_STR"] = kumpula_may["YR--MODAHRMN"].astype(str)
rovaniemi_may["YR--MODAHRMN_STR"] = rovaniemi_may["YR--MODAHRMN"].astype(str)
kumpula_june["YR--MODAHRMN_STR"] = kumpula_june["YR--MODAHRMN"].astype(str)
rovaniemi_june["YR--MODAHRMN_STR"] = rovaniemi_june["YR--MODAHRMN"].astype(str)

# Creating a column and filling it with the daily date
kumpula_may["DAY"] = kumpula_may["YR--MODAHRMN_STR"].str.slice(start = 0, stop = 8)
rovaniemi_may["DAY"] = rovaniemi_may["YR--MODAHRMN_STR"].str.slice(start = 0, stop = 8)
kumpula_june["DAY"] = kumpula_june["YR--MODAHRMN_STR"].str.slice(start = 0, stop = 8)
rovaniemi_june["DAY"] = rovaniemi_june["YR--MODAHRMN_STR"].str.slice(start = 0, stop = 8)

# Grouping dataframe based on YEAR-MONTH-DAY
grouped_kumpula_may = kumpula_may.groupby(by = "DAY")
grouped_rovaniemi_may = rovaniemi_may.groupby(by = "DAY")
grouped_kumpula_june = kumpula_june.groupby(by = "DAY")
grouped_rovaniemi_june = rovaniemi_june.groupby(by = "DAY")

# Aggregating data by day
daily_data_kumpula_may = pd.DataFrame()
mean_cols = ["MAX", "MIN", "Celsius"]
for key, group in grouped_kumpula_may:
    mean_values = group[mean_cols].mean()
    mean_values["DAY"] = key
    row = mean_values.to_frame().transpose()
    daily_data_kumpula_may = pd.concat([daily_data_kumpula_may, row])

daily_data_rovaniemi_may = pd.DataFrame()
mean_cols = ["MAX", "MIN", "Celsius"]
for key, group in grouped_rovaniemi_may:
    mean_values = group[mean_cols].mean()
    mean_values["DAY"] = key
    row = mean_values.to_frame().transpose()
    daily_data_rovaniemi_may = pd.concat([daily_data_rovaniemi_may, row])

daily_data_kumpula_june = pd.DataFrame()
for key, group in grouped_kumpula_june:
    mean_values = group[mean_cols].mean()
    mean_values["DAY"] = key
    row = mean_values.to_frame().transpose()
    daily_data_kumpula_june = pd.concat([daily_data_kumpula_june, row])

daily_data_rovaniemi_june = pd.DataFrame()
for key, group in grouped_rovaniemi_june:
    mean_values = group[mean_cols].mean()
    mean_values["DAY"] = key
    row = mean_values.to_frame().transpose()
    daily_data_rovaniemi_june = pd.concat([daily_data_rovaniemi_june, row])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  kumpula_may["YR--MODAHRMN_STR"] = kumpula_may["YR--MODAHRMN"].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rovaniemi_may["YR--MODAHRMN_STR"] = rovaniemi_may["YR--MODAHRMN"].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  kumpula_june["YR--MODAHRMN_STR"] = kumpula_june["YR--

Unnamed: 0,MAX,MIN,Celsius,DAY
0,43.0,32.0,2.175694,20170601
0,39.5,34.0,2.707917,20170602
0,40.5,35.0,3.479861,20170603
0,42.0,36.0,3.603333,20170604
0,47.5,37.0,7.23,20170605


In [64]:
# Printing the DataFrames daily average of kumpula in may
daily_data_kumpula_may.head()

Unnamed: 0,MAX,MIN,Celsius,DAY
0,49.0,39.0,7.524167,20170501
0,55.5,41.5,9.699583,20170502
0,55.0,42.5,9.212917,20170503
0,51.5,40.5,6.78125,20170504
0,54.5,40.0,10.323333,20170505


In [65]:
# Printing the DataFrames daily average of rovaniemi in may
daily_data_rovaniemi_may.head()

Unnamed: 0,MAX,MIN,Celsius,DAY
0,39.0,30.5,2.198889,20170501
0,43.5,35.5,3.433333,20170502
0,38.0,35.0,2.10493,20170503
0,42.5,32.5,4.390417,20170504
0,51.0,40.0,6.905972,20170505


In [66]:
# Printing the DataFrames daily average of kumpula in june
daily_data_kumpula_june.head()

Unnamed: 0,MAX,MIN,Celsius,DAY
0,52.5,41.0,6.55125,20170601
0,46.5,39.5,6.24875,20170602
0,54.5,42.5,10.162083,20170603
0,59.0,44.0,9.999167,20170604
0,53.0,48.5,10.278333,20170605


In [67]:
# Printing the DataFrames daily average of rovaniemi in june
daily_data_rovaniemi_june.head()

Unnamed: 0,MAX,MIN,Celsius,DAY
0,43.0,32.0,2.175694,20170601
0,39.5,34.0,2.707917,20170602
0,40.5,35.0,3.479861,20170603
0,42.0,36.0,3.603333,20170604
0,47.5,37.0,7.23,20170605


# Done!