# KS_Weather
*Getting weather data with python 3*  

**Author**: Luiz Moro Rosso  
**Semester**: Spring 2019  
**Project area**: Agronomy  
**Date**: 16 April 2019

<br>

# Second progress report

The script will be divided in 5 main steps:

1. Importing `USER_inputs.csv`;  
2. Identifying the closest land station;  
3. Getting data for the specified time interval;  
4. Replacing missing values for all the variables;  
5. Combining the locations in the `CODE_output.csv`.  


**Required modules:**

In [1]:
import requests
import pandas as pd
import numpy as np

<br>

## 1. Importing `USER_inputs.csv`

1. Please, complete the .csv file `USER_inputs.csv` (Table 1);
2. Keep the file in the repository folder in your computer.

**Table 1.** Example of `USER_inputs.csv` showing the required format for each column.

| State          | Location_code  | Latitude       | Longitude      | Start_date     |End_date        |
|:---------------|:---------------|:---------------|:---------------|:---------------|:---------------|
| Kansas         | Manhattan_01   | 00.0000000     | 00.0000000     | MM/DD/YYYY     | MM/DD/YYYY     |

In [2]:
user_input = pd.read_csv('USER_inputs.csv')
user_input['Start_date'] = pd.to_datetime(user_input['Start_date'])
user_input['End_date'] = pd.to_datetime(user_input['End_date'])

user_input

Unnamed: 0,State,Location_code,Latitude,Longitude,Start_date,End_date
0,Kansas,Histgen_Ashland,39.137253,-96.636581,2018-04-27,2018-10-17
1,Kansas,Agrocete_Ashland,39.122417,-96.637027,2018-06-18,2018-10-29
2,Kansas,NitSulfur_Topeka,39.076541,-95.770844,2018-05-10,2018-10-03


<br>

## 2. Identifying the closest land station

Importing all the locations (coordinates) from Kansas Mesonet.

`KANSAS`

In [3]:
Kansas = pd.read_csv('http://mesonet.k-state.edu/rest/stationnames/')

<br>

**Combining locations from different states:**  
*Point for changes in case of adding more states in the future.*

In [4]:
Stations = Kansas

**Function to calculate the distance between coordinates:**  
The `haversin()` function is based on the Haversine formula.

In [5]:
def haversin(lat1,lon1,lat2,lon2):
    
    '''Haversine formula'''
    
    R = 3958.8 # Radius of the earth in miles
    
    dLat = np.radians(lat2-lat1)
    
    dLon = np.radians(lon2-lon1)
    
    a1 = np.sin(dLat/2) * np.sin(dLat/2)
    
    a2 = np.cos(np.radians(lat1)) * np.cos(np.radians(lat2)) * np.sin(dLon/2) * np.sin(dLon/2)
    
    a = a1 + a2 # Just to make the code smaller
    
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    
    d = R * c # Distance between the points
    
    return d

**Applying the function to all the locations:**  
Calculating from each Location to all the stations.

In [6]:
Locations = []; NAMES = []; Distance = []

for i in range(0,len(user_input)):
    
    for j in range(0,len(Stations)):
        
        Locations.append(str(user_input.loc[i,'Location_code']))
        
        NAMES.append(str(Stations.loc[j,'NAME']))
        
        Distance.append(haversin(user_input.loc[i,'Latitude'],
                                 user_input.loc[i,'Longitude'],
                                 Stations.loc[j,'LATITUDE'],
                                 Stations.loc[j,'LONGITUDE']))

DISTANCES = pd.DataFrame(
    {'Location_code': Locations,
     'NAME': NAMES,
     'Distance': Distance
    })

DISTANCES.head()

Unnamed: 0,Location_code,NAME,Distance
0,Histgen_Ashland,Ashland 8S,221.743973
1,Histgen_Ashland,Ashland Bottoms,0.793205
2,Histgen_Ashland,Belleville 2W,72.488763
3,Histgen_Ashland,Butler,93.058521
4,Histgen_Ashland,Cairo,146.110335


**Selecting the station with the smallest distance:**  
It will add new columns to the user_input.

In [7]:
user_input = pd.merge(user_input, DISTANCES.loc[DISTANCES.groupby("Location_code")["Distance"].idxmin()])

user_input

Unnamed: 0,State,Location_code,Latitude,Longitude,Start_date,End_date,NAME,Distance
0,Kansas,Histgen_Ashland,39.137253,-96.636581,2018-04-27,2018-10-17,Ashland Bottoms,0.793205
1,Kansas,Agrocete_Ashland,39.122417,-96.637027,2018-06-18,2018-10-29,Ashland Bottoms,0.233405
2,Kansas,NitSulfur_Topeka,39.076541,-95.770844,2018-05-10,2018-10-03,Silver Lake,0.093515


<br>

## 3. Getting data for the specified time interval

The URL (Uniform Resource Locator) will depend on the station `NAME`, `Start_date` and `End_date`.  
All this information is already available in the `user_input`. But it is necessary to build the URL.

**Creating the URL path for each location and time interval:**

In [8]:
Locations = []; URL = []

for i in range(0,len(user_input)):
    
    Locations.append(str(user_input.loc[i,'Location_code']))
    
    URL.append(str('http://mesonet.k-state.edu/rest/stationdata?stn=') + 
               user_input.loc[i,'NAME'] + str('&int=day&t_start=') + 
          
          str(user_input.loc[i,'Start_date'])[0:4] + 
               str(user_input.loc[i,'Start_date'])[5:7] + 
               str(user_input.loc[i,'Start_date'])[8:10] + 
               str('000000&t_end=') +
          
          str(user_input.loc[i,'End_date'])[0:4] + 
               str(user_input.loc[i,'End_date'])[5:7] + 
               str(user_input.loc[i,'End_date'])[8:10] + 
               str('000000'))
    
URLs = pd.DataFrame(
    {'Location_code': Locations,
     'URL': URL
    })

# Adding the URL to the user_input
user_input = pd.merge(user_input, URLs)

user_input

Unnamed: 0,State,Location_code,Latitude,Longitude,Start_date,End_date,NAME,Distance,URL
0,Kansas,Histgen_Ashland,39.137253,-96.636581,2018-04-27,2018-10-17,Ashland Bottoms,0.793205,http://mesonet.k-state.edu/rest/stationdata?st...
1,Kansas,Agrocete_Ashland,39.122417,-96.637027,2018-06-18,2018-10-29,Ashland Bottoms,0.233405,http://mesonet.k-state.edu/rest/stationdata?st...
2,Kansas,NitSulfur_Topeka,39.076541,-95.770844,2018-05-10,2018-10-03,Silver Lake,0.093515,http://mesonet.k-state.edu/rest/stationdata?st...


**Defining the function to extract data from the web:**

In [9]:
def webaccess(Loc_code,URL):
    
    # HTTP (Hypertext Transfer Protocol) request
    web_data = requests.get(URL)
    
    # Reading the output (list with one element)
    web_data = web_data.text
    
    # Splitting the element to get the lines
    web_data = web_data.split('\n')
    
    # Splitting the elements to get the columns
    web_data = [element.split(",") for element in web_data]
    
    # Creating the pandas data frame with header
    web_data = pd.DataFrame(web_data).T.set_index(0).T
    
    # Converting the TIMESTAMP to the date format
    web_data['TIMESTAMP'] = pd.to_datetime(web_data['TIMESTAMP'])
    
    web_data.insert(loc = 0, column = 'Location_code', value = str(Loc_code))
    
    return web_data

**For loop to apply the functions and combine all the data:**

In [10]:
CODE_output = pd.DataFrame()

for i in range(0,user_input.shape[0]):
    
    data = webaccess(user_input.loc[i,'Location_code'],user_input.loc[i,'URL'])
    
    CODE_output = CODE_output.append(data,ignore_index=True,sort=True)

<br>

# 4. Replacing missing values for all the variables

Replacing lines with "M" (missing values) with the the value of the previous line.

In [11]:
for i in CODE_output.columns:
    
    for j in range(0,CODE_output.shape[0]):
        
        if CODE_output.loc[j,i] == 'M':
            
            CODE_output.loc[j,i] = CODE_output.loc[j-1,i]

CODE_output.head()

Unnamed: 0,Location_code,PRECIP,PRESSUREAVG,PRESSUREMAX,PRESSUREMIN,RELHUM10MAVG,RELHUM10MMAX,RELHUM10MMIN,RELHUM2MAVG,RELHUM2MMAX,...,VWC50CM,VWC5CM,WDIR10M,WDIR10MSTD,WDIR2M,WDIR2MSTD,WSPD10MAVG,WSPD10MMAX,WSPD2MAVG,WSPD2MMAX
0,Histgen_Ashland,0.0,97.81,98.13,97.56,64.46,99.93,32.05,66.41,99.93,...,0.3984,0.3917,271.19,59.88,256.47,58.6,2.67,10.25,2.11,8.62
1,Histgen_Ashland,0.0,97.8,98.18,97.31,38.37,82.1,16.77,42.94,82.1,...,0.3983,0.3839,246.51,52.46,223.06,57.47,3.92,11.88,3.07,11.0
2,Histgen_Ashland,0.0,98.12,98.39,97.79,39.32,70.42,19.09,41.96,70.42,...,0.3987,0.3775,61.33,36.65,53.81,39.63,3.33,8.29,2.49,6.84
3,Histgen_Ashland,0.0,97.9,98.29,97.43,43.39,55.88,34.57,44.89,55.88,...,0.3997,0.3676,118.18,22.13,105.24,24.56,6.58,15.39,4.47,12.31
4,Histgen_Ashland,0.0,97.26,97.64,96.83,44.93,70.56,24.2,46.24,70.56,...,0.4,0.3594,161.24,10.77,155.61,12.92,9.64,20.48,6.54,14.62


<br>

# 5. Combining the locations in the `CODE_output.csv`

Saving the final `CODE_output.csv` in the repository folder.

In [12]:
CODE_output.to_csv('CODE_output.csv', sep=',', encoding='utf-8')