# Get monthly average weather station data (Global)

Extract countries of interest (along with their coordinates) from the
[Global Historical Climatology Network - Monthly (GHCNM) Version 4](https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-monthly-version-4)

A README file can be found [here](https://www1.ncdc.noaa.gov/pub/data/ghcn/v4/readme.txt)

To run this script you will need to download the .dat and .inv files stored in the compressed files (ghncm.*.tar.gz) found [here](https://www1.ncdc.noaa.gov/pub/data/ghcn/v4/)

In [1]:
import numpy as np
import pandas as pd
import glob
import ghcnm

### Name of original data file from GHCN-M
https://www1.ncdc.noaa.gov/pub/data/ghcn/v4/

In [2]:
data_fname   = glob.glob('ghcnm.v4*/*.dat')[0]
stn_md_fname = glob.glob('ghcnm.v4*/*.inv')[0]

print(data_fname)
print(stn_md_fname)

ghcnm.v4.0.1.20200205/ghcnm.tavg.v4.0.1.20200205.qfe.dat
ghcnm.v4.0.1.20200205/ghcnm.tavg.v4.0.1.20200205.qfe.inv


In [3]:
stn_md = ghcnm.get_stn_metadata(stn_md_fname)
stn_md

Unnamed: 0,station,lat,lon,elev,name,country
0,ACW00011604,57.7667,11.8667,18.0,SAVE,Antigua and Barbuda
1,AE000041196,25.3330,55.5170,34.0,SHARJAH_INTER_AIRP,United Arab Emirates
2,AEM00041184,25.6170,55.9330,31.0,RAS_AL_KHAIMAH_INTE,United Arab Emirates
3,AEM00041194,25.2550,55.3640,10.4,DUBAI_INTL,United Arab Emirates
4,AEM00041216,24.4300,54.4700,3.0,ABU_DHABI_BATEEN_AIR,United Arab Emirates
...,...,...,...,...,...,...
27428,ZIXLT371333,-17.8300,31.0200,1471.0,HARARE_BELVEDERE,Zimbabwe
27429,ZIXLT443557,-18.9800,32.4500,1018.0,GRAND_REEF,Zimbabwe
27430,ZIXLT622116,-19.4300,29.7500,1411.0,GWELO,Zimbabwe
27431,,,,,,Palmyra Atoll [United States]


### Specify countries to extract data from

In [4]:
pd.unique(stn_md['country'])

array(['Antigua and Barbuda', 'United Arab Emirates', 'Afghanistan',
       'Algeria', 'Azerbaijan', 'Albania', 'Armenia', 'Angola',
       'American Samoa [United States]', 'Argentina', 'Australia',
       'Austria', 'Antarctica', 'Bahrain', 'Barbados', 'Botswana',
       'Bermuda [United Kingdom]', 'Belgium', 'Bahamas, The',
       'Bangladesh', 'Belize', 'Bosnia and Herzegovina', 'Bolivia',
       'Burma', 'Benin', 'Belarus', 'Solomon Islands', 'Brazil',
       'Bulgaria', 'Brunei', 'Burundi', 'Canada', 'Cambodia', 'Chad',
       'Sri Lanka', 'Congo (Brazzaville)', 'Congo (Kinshasa)', 'China',
       'Chile', 'Cayman Islands [United Kingdom]',
       'Cocos (Keeling) Islands [Australia]', 'Cameroon', 'Comoros',
       'Colombia', 'Northern Mariana Islands [United States]',
       'Costa Rica', 'Central African Republic', 'Cuba', 'Cape Verde',
       'Cook Islands [New Zealand]', 'Cyprus', 'Denmark', 'Dijibouti',
       'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'Ireland',

In [5]:
country_names = ['Jordan', 'Egypt']

my_stns = stn_md[stn_md['country'].isin(country_names).values]

my_stns

Unnamed: 0,station,lat,lon,elev,name,country
6808,EG000062306,31.3331,27.2167,25.0,MERSA_MATRUH,Egypt
6809,EG000062414,23.9667,32.7831,200.0,ASSWAN,Egypt
6810,EG000062417,29.2,25.3167,-15.0,SIWA,Egypt
6811,EG000062432,25.4831,29.0,107.0,DAKHLA,Egypt
6812,EG000062463,27.15,33.7167,16.0,HURGUADA,Egypt
6813,EG000624200,28.3331,28.9,127.0,BAHARIA,Egypt
6814,EGE00147724,31.55,25.18,4.0,SALLOUM,Egypt
6815,EGE00147726,31.28,32.2297,6.0,PORT_SAID,Egypt
6816,EGE00147727,30.08,31.29,30.0,CAIRO_ABBASSIA,Egypt
6817,EGE00147728,30.05,31.25,20.0,CAIRO_EZBEKIYA,Egypt


### Extract data for specified stations into a Pandas DataFrame

In [6]:
df = ghcnm.get_data(data_fname, my_stns)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_tmp['year'] = (df_tmp['year'] * 100.) + m # (e.g., 1982 --> 198204)


In [7]:
df

Unnamed: 0,station,lat,lon,elev,name,country,date,tavg,dmflag,qcflag,dsflag
0,EG000062306,31.3331,27.2167,25.0,MERSA_MATRUH,Egypt,196101,13.77,E,,
1,EG000062306,31.3331,27.2167,25.0,MERSA_MATRUH,Egypt,196102,12.63,E,,
2,EG000062306,31.3331,27.2167,25.0,MERSA_MATRUH,Egypt,196103,14.46,,,S
3,EG000062306,31.3331,27.2167,25.0,MERSA_MATRUH,Egypt,196104,17.42,,,S
4,EG000062306,31.3331,27.2167,25.0,MERSA_MATRUH,Egypt,196105,21.56,,,S
...,...,...,...,...,...,...,...,...,...,...,...
26395,JOXLT787555,31.8500,35.4500,250.0,JERICHO,Jordan,201008,34.53,E,,
26396,JOXLT787555,31.8500,35.4500,250.0,JERICHO,Jordan,201009,32.09,E,,
26397,JOXLT787555,31.8500,35.4500,250.0,JERICHO,Jordan,201010,29,E,,
26398,JOXLT787555,31.8500,35.4500,250.0,JERICHO,Jordan,201011,24.44,E,,


### Save to file

In [8]:
df.to_csv('test_ghcnm.csv', index=False)