# 1. Introduction

The data is from <a href='https://www.ncdc.noaa.gov/noaa-merged-land-ocean-global-surface-temperature-analysis-noaaglobaltemp-v5'> NOAA Merged Land Ocean Global Surface Temperature Analysis Dataset (NOAAGlobalTemp)</a>, which merges two dataset to form one land–ocean surface temperature analysis (formerly known as MLOST). NOAAGlobalTEmp consists of a spatially gridded (5°× 5°) global surface temperature dataset, with monthly resolution from January 1880 to present.

The objectives of this project include:

(a) Plot the anomaly time series of temperature history from 10 different cities in a given specific time frame;

(b) Map the anomaly global temperature for a given point in time;

(c) Map the anomaly global mean temperature.

* All the temperature in this project is anomaly teperature.

# 2. Methods

## 2.1 Data importing
<p>The data imported include:</p>


<p>a) <a href='https://www.ncei.noaa.gov/data/noaa-global-surface-temperature/v5/access/gridded/'>5°× 5°gridded data of anomaly temperature</a></p>

<p>The dataset was downloaded in the format of netCDF. Module netCDF4 was used to open it.</p>

<p>b) <a href='https://simplemaps.com/resources/free-country-cities'>city coordinates dataset including the longitudes and latitudes of cities</a></p>

<p>The dataset contains latidutes, longitudes, population and other information of 15,492 major cities worldwide. Module pandas was used to read this csv file. Only the latidutes and longitudes of the first ten cities were used in this project.</p>

## 2.2 Data preprocessing
<p>At the very beginning, only the 5°× 5°gridded data of anomaly temperature was preprocessed, since its original data were in formats that could not be used by Basemap directly.</p>
<p>Since later the global mean temperatures, including the ones by year and by month, were required to be plotted. The mean temperatures of each year and each month were supposed to be calculated. Due to missing values in the gridded dataset, numpy.nanmean was used to calculate mean values, which ignored missing values.</p>
    
## 2.3 Data visualization
**2.3.1 Plot the time series of temperature history from 10 different cities in a given specific time frame**
<p>With the latitudes and longitudes provided by the city coordinate dataset, the first 10 cities could be located on the 5°× 5°grid. And by inputing the start year, start month, end year, end month, a series of temperatures of these cities could be extracted from the gridded dataset. Using these data, the time series of temperature history could be plotted using matplotlib.pyplot. By constructing such a function taking city names and time frames as inputs, and further interacting it with ipython widgets, users could use drop down menus to select which city and time frame (between 1880.1 - 2020.2) they would like the plot to show.</p>
    
**2.3.2 Map the global temperature for a given point in time**
<p>With the matplotlib Basemap module, a world map could be constructed and the 5°× 5°gridded data could be mapped on it. Still, the interactive widgets were connected to the plotting function, so users could use drop down menus to select a time point, including year and month between 1880.1 - 2020.2, and the plot would show the global temperature at that time on the map. The grids with missing temperature would not have color.</p>

**2.3.3 Map the global mean temperature**
<p>With the arrays storing mean temperatures obtained, functions plotting the mean temperatures of each grids were connected with interactive widgets. Hence, for annual mean temperature plots, users could use drop down menus to select a specific year between 1880 and 2020, and the function would plot the mean temperature of each grid in that year. Similarly, for monthly mean temperatures, users could select a specific month from Jan to Dec, and the function would plot the mean temperature of each grid in every year. </p>

# 3. Implementation

At the very beginning, import all modules used in this project.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import ipywidgets as widgets
from netCDF4 import Dataset
import datetime
from mpl_toolkits.basemap import Basemap
pd.plotting.register_matplotlib_converters()
%matplotlib inline

## 3.1 Data importing

The data imported include:


a) gridded data of anomaly temperature

In [2]:
# the gridded tempreture (netCDF data)
nc = Dataset('data\\NOAAGlobalTemp_v5.0.0_gridded_s188001_e202002_c20200308T133325.nc','r')

# look through the heads and dimensions of dataset
for i in nc.variables:
    print([i, nc.variables[i].shape])

['time', (1682,)]
['lat', (36,)]
['lon', (72,)]
['z', (1,)]
['anom', (1682, 1, 36, 72)]


In [3]:
time = nc.variables['time'][:]
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
temp = nc.variables['anom'][:]
nc.close()

b) city coordinates dataset including the longitudes and latitudes of cities

In [4]:
# load the city coordinates data
ct = pd.read_csv('data\\worldcities.csv')
ct = ct.loc[:,['city', 'lat', 'lng']]

# select the information of first 10 cities to use
print(ct.head(10))
ct.index = ct['city']

          city      lat       lng
0        Tokyo  35.6850  139.7514
1     New York  40.6943  -73.9249
2  Mexico City  19.4424  -99.1310
3       Mumbai  19.0170   72.8570
4    São Paulo -23.5587  -46.6250
5        Delhi  28.6700   77.2300
6     Shanghai  31.2165  121.4365
7      Kolkata  22.4950   88.3247
8  Los Angeles  34.1139 -118.4068
9        Dhaka  23.7231   90.4086


## 3.2 Data preprocessing

**3.2.1 time preprocessing**

There are 1682 time points in the gridded dataset. Since the time points are monthly from Jan 1880 to present and the differences among them are always around 30 (as follows), we defined lists of years and months in regular calendar seperately from 1880.01 to 2020.02. These two lists also contain 1682 elements, which are the corresponding year and month of each month.

In [5]:
time

masked_array(data=[29219., 29250., 29279., ..., 80322., 80353., 80384.],
             mask=False,
       fill_value=1e+20,
            dtype=float32)

In [6]:
# define the time in regular calendar
# year
year = []
for i in range(1880, 2020):
    for j in range(12):
        year.append(i)
year.append(2020)
year.append(2020)

# month
month = [i for i in range(1,13)]*140
month.append(1)
month.append(2)

**3.2.2 latitudes and longitudes preprocessing**

First, checked the original latitudes and longtidues to see if they satisfies the coordinates used by Basemap. Since Basemap takes nothern latitudes (0 ~ 90) and eastern longitudes (0 ~ 180) as postive, and takes southern latitudes (0 ~ -90) and western longtitudes (0 ~ -180) as negative, the longitudes were subtracted by 180. And because later on, in order to use index to slice the data easily, these masked arrays were converted to lists.


In [7]:
# Original coordinate data, latitudes
lat

masked_array(data=[-87.5, -82.5, -77.5, -72.5, -67.5, -62.5, -57.5, -52.5,
                   -47.5, -42.5, -37.5, -32.5, -27.5, -22.5, -17.5, -12.5,
                    -7.5,  -2.5,   2.5,   7.5,  12.5,  17.5,  22.5,  27.5,
                    32.5,  37.5,  42.5,  47.5,  52.5,  57.5,  62.5,  67.5,
                    72.5,  77.5,  82.5,  87.5],
             mask=False,
       fill_value=1e+20,
            dtype=float32)

In [8]:
# Original coordinate data, longitudes
lon

masked_array(data=[  2.5,   7.5,  12.5,  17.5,  22.5,  27.5,  32.5,  37.5,
                    42.5,  47.5,  52.5,  57.5,  62.5,  67.5,  72.5,  77.5,
                    82.5,  87.5,  92.5,  97.5, 102.5, 107.5, 112.5, 117.5,
                   122.5, 127.5, 132.5, 137.5, 142.5, 147.5, 152.5, 157.5,
                   162.5, 167.5, 172.5, 177.5, 182.5, 187.5, 192.5, 197.5,
                   202.5, 207.5, 212.5, 217.5, 222.5, 227.5, 232.5, 237.5,
                   242.5, 247.5, 252.5, 257.5, 262.5, 267.5, 272.5, 277.5,
                   282.5, 287.5, 292.5, 297.5, 302.5, 307.5, 312.5, 317.5,
                   322.5, 327.5, 332.5, 337.5, 342.5, 347.5, 352.5, 357.5],
             mask=False,
       fill_value=1e+20,
            dtype=float32)

In [9]:
# preprocessing the longitude 
lon = list(lon-180)
lat = list(lat)

**3.2.3 anomaly temperature preprocessing**

Originally, temp was the anomaly temprature from gridded dataset, which is a 1682×1×36×72 masked array. Whan mask = True, the corresponding data is missing or invalid. For example, the first element of the first dimension is as follows. Whenever there is a 'True' is the mask, there is a '--' (missing value) in the data.

In [10]:
temp[0,0,:,:]

masked_array(
  data=[[--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        ...,
        [--, --, -0.31496092677116394, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --]],
  mask=[[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True, False, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],
  fill_value=-999.9,
  dtype=float32)

## 3.3 Data visualization

**3.3.1 Plot the time series of temperature history from 10 different cities in a given specific time frame**

First, built a function that takes the city names and time frames as inputs. Then, connected the function with ipython widgets to make a drop down choice for users.

In [11]:
def tsplot(start_year, start_month, end_year, end_month, city):
    '''
    This function generate the time series plot for a specific city in a time period.
    start_year: the number of year that starts the time frame. e.g. 1880
    start_month: the number of month that starts the time frame. e.g. 1
    end_year: the number of year that ends the time frame. e.g. 2020
    end_month:the number of month that ends the time frame. e.g. 02
    city: the name string of the city. e.g. 'Los Angeles'
    
    '''
    # get the index of the time range
    startind = (start_year - 1880)*12 + start_month - 1    
    endind = (end_year - 1880)*12 + end_month 
    
    # find the 5x5 grid the city locates
    city_lat = ct.loc[city, "lat"]
    formerlatind = max([lat.index(i) for i in lat if i < city_lat])
    if (city_lat - lat[formerlatind]) <= 2.5:
        lat_index = formerlatind
    else:
        lat_index = formerlatind + 1
    
    city_lon = ct.loc[city, "lng"]
    formerlonind = max([lon.index(i) for i in lon if i < city_lon])
    if abs(city_lon - lon[formerlonind]) <= 2.5:
        lon_index = formerlonind
    else:
        lon_index = formerlonind + 1
       
    plt.figure(figsize = (10,5)) 
    x = [datetime.datetime.strptime(str(year[i])+str(month[i]), '%Y%m').date() for i in range(startind, endind)]
    y = temp[startind:endind, 0, lat_index, lon_index]
    plt.title('Time series of {} from {}.{} to {}.{}'.format(city, start_year, start_month, end_year, end_month), fontsize=20)
    plt.xlabel('Time', fontsize = 15)
    plt.ylabel('Temperature ℃', fontsize = 15)
    plt.xticks(rotation=90, fontsize = 15)
    plt.plot(x, y)


dropstart_year = widgets.Dropdown(options=[i for i in range(1880, 2021)],value=1880,description='Start year:')
dropstart_month = widgets.Dropdown(options=[i for i in range(1, 13)],value=1,description='Start month:')
dropend_year = widgets.Dropdown(options=[i for i in range(1880, 2021)],value=2020,description='End year:')
dropend_month = widgets.Dropdown(options=[i for i in range(1, 13)],value=2,description='End month:')
dropcity = widgets.Dropdown(options=[i for i in ct.iloc[:10, 0]], value='Los Angeles',description='City')

timeseriesplot = widgets.interactive(tsplot, {'manual': True}, city=dropcity, start_year=dropstart_year, start_month=dropstart_month, end_year=dropend_year, end_month=dropend_month)

**3.3.2 Map the global temperature for a given point in time**

First, built a function that takes the time point (year and month) as inputs, and uses matplotlib Basemap to map the global temperatures at that time point on the world map. Then, connected the function with ipython widgets to make a drop down choice for users.

In [12]:
def globaltemp(Y, M):
    '''
    This function maps the world temperature on the world map at a specific time point, year and month.
    Y: the year
    M: the month
    '''
    
    # build the world map
    plt.figure(figsize=(20,20))
    mp = Basemap()

    lons, lats = np.meshgrid(lon, lat)
    x, y = mp(lons, lats)
    
    # the index of the time point in the gridded dataset
    num = year.index(Y) + M - 1
    
    # map the data on the world map
    c_scheme = mp.pcolor(x, y, np.squeeze(temp[num,:,:]), cmap = 'jet')

    mp.drawcoastlines()
    mp.colorbar(c_scheme, location = 'right', pad = '10%')
    plt.title('Global Temperature Map of {} - {} (℃)'.format(Y, M), fontsize=30)
    plt.show()

# the dropdown interactive widgets
dropyear = widgets.Dropdown(options=[i for i in range(1880, 2021)],value=1880,description='Year:')
dropmonth = widgets.Dropdown(options=[i for i in range(1, 13)],value=1,description='Month:')

# connect the function and ipython widgets together
globaltempplot = widgets.interactive(globaltemp, {'manual': True}, Y = dropyear, M = dropmonth)

**3.3.3 Plot the global mean temperature**

**a) Map the annual global mean temperature**

First, built a 141×36×72 array called annuallytemp to store the annual mean tempratures of each grids in a 141 year range.

In [29]:
# annually
annuallytemp = np.zeros(((2020-1880+1), 36, 72))

for i in range(1682):
    monthi = i%12
    if i <= 1680:
        if monthi == 11:
            yeari = int((i - monthi)/12)
            annuallytemp[yeari,:,:] = np.nanmean(temp[(i-11):(i+1),0,:,:], axis = 0) # ignore the missing values
    else:
        if monthi == 1:
            annuallytemp[(2020-1880),:,:] = np.nanmean(temp[(i-1):(i+1),0,:,:], axis = 0)

Then, built a function ploting the annual global mean temperatures and connected it with interactive widgets.

In [77]:
def annually(Y):
    '''
    This function maps the annually world temperature on the world map
    Y: the year
    '''
    plt.figure(figsize=(20,20))
    mpannual = Basemap()

    lons, lats = np.meshgrid(lon, lat)
    x, y = mpannual(lons, lats)
    
    c_scheme = mpannual.pcolor(x, y, np.squeeze(annuallytemp[(Y-1880),:,:]), cmap = 'jet')

    mpannual.drawcoastlines()
    mpannual.colorbar(c_scheme, location = 'right', pad = '10%')
    plt.title('Annually Global Temperature Map of {} (℃)'.format(Y), fontsize=30)
    plt.show()

dropyear = widgets.Dropdown(options=[i for i in range(1880, 2021)],value=1880,description='Year:')

annualavgplot = widgets.interactive(annually, Y = dropyear)


**b) Map the monthly global mean temperature**

First, built a 12×36×72 array called monthlytemp to store the month mean tempratures of each grids in a 12 month range.

In [78]:
# monthly global mean temperature
monthlytemp = np.zeros((12, 36, 72))

for i in range(12):
    monthlist = [k for k in range(i, 1682, 12)]
    monthlytemp[i,:,:] = np.nanmean(temp[monthlist,:,:], axis = 0) # ignore missing values

Then, built a function ploting the monthly global mean temperatures and connected it with interactive widgets.

In [79]:
monthNames = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
    
def monthly(M):
    '''
    This function maps the monthly world temperature on the world map
    M: the month
    '''
    plt.figure(figsize=(20,20))
    mpmonth = Basemap()

    lons, lats = np.meshgrid(lon, lat)
    x, y = mpmonth(lons, lats)
    
    monthind = monthNames.index(M)
    c_scheme = mpmonth.pcolor(x, y, np.squeeze(monthlytemp[monthind,:,:]), cmap = 'jet')

    mpmonth.drawcoastlines()
    mpmonth.colorbar(c_scheme, location = 'right', pad = '10%')
    plt.title('Monthly Global Temperature Map of {} (℃)'.format(M), fontsize=30)
    plt.show()

dropmonth = widgets.Dropdown(options=monthNames,value="January",description='Month:')

monthavgplot = widgets.interactive(monthly, M = dropmonth)

# 4. Testing

## 4.1 Data preprocessing

First, check if the lists of year and month provide correct values corresponding every month from 1880.1 to 2020.2. It's correct.

In [80]:
print("The first 20 elements of year: ", year[:20], "\n")
print("The last 20 elements of year: ", year[-20:], "\n")
print("The first 20 elements of month: ", month[:20],"\n")
print("The last 20 elements of month: ", month[-20:])

The first 20 elements of year:  [1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1881, 1881, 1881, 1881, 1881, 1881, 1881, 1881] 

The last 20 elements of year:  [2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2020, 2020] 

The first 20 elements of month:  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8] 

The last 20 elements of month:  [7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2]


Second, check if the coordinate data satisfy the range of Basemap: latitude from -90 to 90, longitude from -180 to 180. And check if they construct 5°× 5°grids (The differences between adjacent numbers are always 5). It's correct.

In [81]:
print("the latitude data are: \n", lat, "\n")
print("the longitude data are: \n", lon)

the latitude data are: 
 [-87.5, -82.5, -77.5, -72.5, -67.5, -62.5, -57.5, -52.5, -47.5, -42.5, -37.5, -32.5, -27.5, -22.5, -17.5, -12.5, -7.5, -2.5, 2.5, 7.5, 12.5, 17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5, 52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5, 87.5] 

the longitude data are: 
 [-177.5, -172.5, -167.5, -162.5, -157.5, -152.5, -147.5, -142.5, -137.5, -132.5, -127.5, -122.5, -117.5, -112.5, -107.5, -102.5, -97.5, -92.5, -87.5, -82.5, -77.5, -72.5, -67.5, -62.5, -57.5, -52.5, -47.5, -42.5, -37.5, -32.5, -27.5, -22.5, -17.5, -12.5, -7.5, -2.5, 2.5, 7.5, 12.5, 17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5, 52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5, 87.5, 92.5, 97.5, 102.5, 107.5, 112.5, 117.5, 122.5, 127.5, 132.5, 137.5, 142.5, 147.5, 152.5, 157.5, 162.5, 167.5, 172.5, 177.5]


## 4.2 Mean temperature calculation

First, check the annual mean temperature. The shape is 141×36×72, correct. Then take the year 1880 as an example, to see if the mean teperature of grid (12.5,-77.5) (index (20, 20), without missing value) and the mean temprature of grid (-52.5, -72.5) (index (7, 21), with missing values) are the same as that calculated directly. They are correct.

In [82]:
print(annuallytemp.shape)
assert np.nanmean(temp[:12, 0, 20, 20]) - annuallytemp[0,20,20] < 1e-14
assert np.nanmean(temp[:12, 0, 7, 21]) - annuallytemp[0,7,21] < 1e-14

(141, 36, 72)


Second, check the monthly mean temperature. The shape is 12×36×72, correct. Then take the month Jan as an example, to see if the mean teperature of grid (12.5,-77.5) (index (20, 20), without missing value) and the mean temprature of grid (-52.5, -72.5) (index (7, 21), with missing values) are the same as that calculated directly. They are correct.

In [83]:
print(monthlytemp.shape)
Jan = [i for i in range(0, 1682, 12)]
assert np.nanmean(temp[Jan, 0, 20, 20]) - monthlytemp[0,20,20] < 1e-14
assert np.nanmean(temp[Jan, 0, 7, 21]) - monthlytemp[0,7,21] < 1e-14

(12, 36, 72)


# 5. Results

Since there is no problem in the testing of data, three required plots were plotted.

P.S. The plots are only shown in Jupyter notebook.

## 5.1 Plot the time series of temperature history from 10 different cities in a given specific time frame

In [85]:
timeseriesplot

interactive(children=(Dropdown(description='Start year:', index=3, options=(1880, 1881, 1882, 1883, 1884, 1885…

## 5.2 Map the global temperature for a given point in time

In [86]:
globaltempplot

interactive(children=(Dropdown(description='Year:', options=(1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1…

## 5.3 Plot the global mean temperature

**a) Map the annual global mean temperature**

In [87]:
annualavgplot

interactive(children=(Dropdown(description='Year:', options=(1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1…

**b) Map the monthly global mean temperature**

In [88]:
monthavgplot

interactive(children=(Dropdown(description='Month:', options=('January', 'February', 'March', 'April', 'May', …