## Challenge 422

As spring emerges in the Northern Hemisphere, this weekly challenge delves into temperatures. The dataset presents comprehensive information on global temperature records, covering various countries worldwide. It includes average temperature records in Celsius for major cities from 1743 to 2013. 

To solve this challenge, we will be concentrating on the data from 1950 onwards.

Your tasks are as follows:

1. Determine which cities have average temperatures greater than or equal to 25 degrees.
2. Among the cities identified in the previous task, identify the country with the highest number of such cities.
3. Examining all countries within the dataset, pinpoint the year with the highest average temperature and the year with the lowest average temperature across the globe.

Source: Global Temperature Records (1743-2013) (kaggle.com)


In [2]:
import pandas as pd

In [3]:
df = pd.read_csv(r'C:\Users\LeLuu\Documents\Python_Practice\Challenge_422\GlobalLandTemperaturesByMajorCityNew.csv')

In [4]:
df

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
0,1849-01-01,26.704,1.435,Abidjan,Cote d'Ivoire,5.63N,3.23W
1,1849-02-01,27.434,1.362,Abidjan,Cote d'Ivoire,5.63N,3.23W
2,1849-03-01,28.101,1.612,Abidjan,Cote d'Ivoire,5.63N,3.23W
3,1849-04-01,26.140,1.387,Abidjan,Cote d'Ivoire,5.63N,3.23W
4,1849-05-01,25.427,1.200,Abidjan,Cote d'Ivoire,5.63N,3.23W
...,...,...,...,...,...,...,...
239172,2013-05-01,18.979,0.807,Xian,China,34.56N,108.97E
239173,2013-06-01,23.522,0.647,Xian,China,34.56N,108.97E
239174,2013-07-01,25.251,1.042,Xian,China,34.56N,108.97E
239175,2013-08-01,24.528,0.840,Xian,China,34.56N,108.97E


In [5]:
#Convert the 'dt' column to datetime format
df['dt'] = pd.to_datetime(df['dt'])

In [6]:
#Filter the DataFrame for dates from January 1, 1950 onwards
filtered_df = df[df.dt >= '1950-01-01']

In [7]:
#Group by city and country find the average temperature
avg_temp_df = filtered_df.groupby(['City','Country']).agg({'AverageTemperature': 'mean'})

In [8]:
#filter the data for listing all cities and countries which has avg temperature greater than or equal to 25 degrees Celsius
q1_ans = avg_temp_df[avg_temp_df['AverageTemperature'] >=25].reset_index()

In [9]:
q1_ans

Unnamed: 0,City,Country,AverageTemperature
0,Abidjan,Cote d'Ivoire,26.518292
1,Ahmadabad,India,26.982792
2,Bangalore,India,25.290026
3,Bangkok,Thailand,27.633291
4,Bombay,India,27.066308
5,Calcutta,India,26.521914
6,Dar Es Salaam,Tanzania,26.011661
7,Delhi,India,25.618832
8,Dhaka,Bangladesh,25.961742
9,Fortaleza,Brazil,27.364901


### 2. Among the cities identified in the previous task, identify the country with the highest number of such cities.

In [10]:
#Count the number of cities for each contry
q1_ans.groupby('Country').agg({'City':'count'}).sort_values(by='City', ascending=False).head(1).reset_index()

Unnamed: 0,Country,City
0,India,14


In [11]:
q1_ans.groupby('Country').agg({'City':'count'}).sort_values(by='City', ascending=False)

Unnamed: 0_level_0,City
Country,Unnamed: 1_level_1
India,14
Nigeria,3
Saudi Arabia,2
Brazil,2
Indonesia,2
Bangladesh,1
Dominican Republic,1
Burma,1
Cote d'Ivoire,1
Pakistan,1


### 3. Examining all countries within the dataset, pinpoint the year with the highest average temperature and the year with the lowest average temperature across the globe.

In [12]:
filtered_df

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
1212,1950-01-01,26.773,0.239,Abidjan,Cote d'Ivoire,5.63N,3.23W
1213,1950-02-01,27.527,0.348,Abidjan,Cote d'Ivoire,5.63N,3.23W
1214,1950-03-01,28.344,0.431,Abidjan,Cote d'Ivoire,5.63N,3.23W
1215,1950-04-01,27.830,0.467,Abidjan,Cote d'Ivoire,5.63N,3.23W
1216,1950-05-01,26.896,0.248,Abidjan,Cote d'Ivoire,5.63N,3.23W
...,...,...,...,...,...,...,...
239172,2013-05-01,18.979,0.807,Xian,China,34.56N,108.97E
239173,2013-06-01,23.522,0.647,Xian,China,34.56N,108.97E
239174,2013-07-01,25.251,1.042,Xian,China,34.56N,108.97E
239175,2013-08-01,24.528,0.840,Xian,China,34.56N,108.97E


In [13]:
filtered_df['Year'] = filtered_df['dt'].dt.year

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df['Year'] = filtered_df['dt'].dt.year


In [14]:
highest_temp = filtered_df.groupby(filtered_df['dt'].dt.year).agg({'AverageTemperature':'mean'}).sort_values(by='AverageTemperature', ascending=False).head(1).reset_index()

In [15]:
highest_temp.rename(columns={'dt':'Highest Year','AverageTemperature':'Highest Temperature'}, inplace=True)

In [16]:
highest_temp

Unnamed: 0,Highest Year,Highest Temperature
0,2013,20.263006


In [17]:
lowest_temp = filtered_df.groupby(filtered_df['dt'].dt.year).agg({'AverageTemperature':'mean'}).sort_values(by='AverageTemperature', ascending=True).head(1).reset_index()

In [18]:
lowest_temp.rename(columns={'dt':'Lowest Year','AverageTemperature':'Lowest Temperature'}, inplace=True)

In [19]:
lowest_temp

Unnamed: 0,Lowest Year,Lowest Temperature
0,1956,18.480926


In [20]:
pd.concat([highest_temp, lowest_temp], axis=1)

Unnamed: 0,Highest Year,Highest Temperature,Lowest Year,Lowest Temperature
0,2013,20.263006,1956,18.480926
