### **Extracting Weather Data using Weather API from Visualcrossing**

The accuracy and utility of a NYC taxi demand prediction are greatly increased when weather data is added. The demand for taxis and the supply of drivers are both impacted by the weather. For example, when it's raining or snowing, more individuals choose to use cabs because they don't want to walk or ride their bikes. In the same way, high temperatures may encourage more people to use taxis. The effects of the weather might change during the day and in different parts of the city, which can affect demand trends. The model can capture these subtleties and provide a more comprehensive picture of how weather impacts taxi demand by including weather data, such as temperature, precipitation, humidity, and wind speed.

In [None]:
import requests
from datetime import datetime
from io import StringIO
import pandas as pd



def get_weather_data(api_key, location, start_date, end_date,content):
    base_url = "https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/"

    # Formatting the request URL
    request_url = f"{base_url}/{location}/{start_date}/{end_date}?unitGroup=us&include={content}&key={api_key}&contentType=csv"

    response = requests.get(request_url)
    print(response)
    if response.status_code == 200:
        data = StringIO(response.text)
        weather_df = pd.read_csv(data)
        return weather_df
    else:
        print("Failed to fetch data")
        return None

# Configuration
api_key = 'NG4YBQ7MRFZLMHM6G8N3KSF8H'
location = "New York, NY"
start_date = '2023-02-01'
end_date = '2023-02-28'
content = 'hours'





In [None]:
weather_df_jan = pd.read_csv('weather_data_jan_hourly (1).csv')

weather_df_feb = pd.read_csv('weather_data_feb_hourly (1).csv')

weather_df_mar = pd.read_csv('weather_data_march_hourly (1).csv')

weather_df = pd.concat([weather_df_jan,weather_df_feb,weather_df_mar])

weather_df.shape

(2159, 24)

In [None]:
weather_df.head()

Unnamed: 0,name,datetime,temp,feelslike,dew,humidity,precip,precipprob,preciptype,snow,...,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,severerisk,conditions,icon,stations
0,"New York, NY, United States",2023-01-01T00:00:00,54.0,54.0,51.8,92.43,0.009,100,rain,0.0,...,1007.1,100.0,7.7,227,0.8,0,10,"Rain, Overcast",rain,"72505394728,KLGA,D2180,KNYC,72503014732"
1,"New York, NY, United States",2023-01-01T01:00:00,53.1,53.1,50.9,92.14,0.001,100,rain,0.0,...,1007.2,100.0,7.2,228,0.8,0,10,"Rain, Overcast",rain,"72505394728,KLGA,D2180,KNYC,72503014732"
2,"New York, NY, United States",2023-01-01T02:00:00,53.3,53.3,50.0,88.43,0.004,100,rain,0.0,...,1006.8,43.6,7.7,227,0.8,0,10,"Rain, Partially cloudy",rain,"72505394728,KLGA,D2180,KNYC,72503014732"
3,"New York, NY, United States",2023-01-01T03:00:00,52.6,52.6,49.7,89.82,0.0,0,,0.0,...,1007.8,13.4,9.0,224,0.8,0,10,Clear,clear-night,"72505394728,KLGA,D2180,KNYC,72503014732"
4,"New York, NY, United States",2023-01-01T04:00:00,53.3,53.3,48.6,83.95,0.0,0,,0.0,...,1008.1,7.2,9.9,222,0.8,0,10,Clear,clear-night,"72505394728,KLGA,D2180,KNYC,72503014732"


In [None]:
weather_df.isnull().sum()

name                   0
datetime               0
temp                   0
feelslike              0
dew                    0
humidity               0
precip                 0
precipprob             0
preciptype          1805
snow                   0
snowdepth              0
windgust               0
windspeed              0
winddir                0
sealevelpressure       0
cloudcover             0
visibility             0
solarradiation         0
solarenergy            0
uvindex                0
severerisk             0
conditions             0
icon                   0
stations               0
dtype: int64

In the weather dataset, we have found that In the ‘preciptype’ columns, there are null values since we don’t consider the column for modelling, we have dropped that column and there are unnecessary symbols in the datetime column. As a part of the data cleaning process, these unnecessary symbols are removed.

In [None]:
weather_df.drop(columns = ['preciptype'],inplace=True)

In [None]:
weather_df.dtypes

name                 object
datetime             object
temp                float64
feelslike           float64
dew                 float64
humidity            float64
precip              float64
precipprob            int64
snow                float64
snowdepth           float64
windgust            float64
windspeed           float64
winddir               int64
sealevelpressure    float64
cloudcover          float64
visibility          float64
solarradiation        int64
solarenergy         float64
uvindex               int64
severerisk            int64
conditions           object
icon                 object
stations             object
dtype: object

In [None]:
weather_df.describe()

Unnamed: 0,temp,feelslike,dew,humidity,precip,precipprob,snow,snowdepth,windgust,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,severerisk
count,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0,2159.0
mean,42.988837,39.275035,28.474247,59.207638,0.00289,10.792033,0.002209,0.036836,16.678648,7.170218,213.452987,1016.016906,50.020658,9.33182,144.893932,0.52163,1.037054,10.0
std,7.82185,10.184765,10.513108,19.055924,0.016161,31.035165,0.056185,0.277801,8.520598,3.981921,111.14615,7.880842,45.500193,1.655673,186.480523,0.671807,1.916592,0.0
min,6.4,-10.2,-11.4,18.13,0.0,0.0,0.0,0.0,2.2,0.0,0.0,991.3,0.0,0.6,0.0,0.0,0.0,10.0
25%,38.0,32.9,21.6,45.3,0.0,0.0,0.0,0.0,9.2,4.7,100.0,1010.45,1.5,9.9,0.0,0.0,0.0,10.0
50%,42.4,38.7,27.9,56.9,0.0,0.0,0.0,0.0,16.1,6.7,258.0,1016.5,32.0,9.9,59.0,0.2,0.0,10.0
75%,47.9,45.3,35.7,73.59,0.0,0.0,0.0,0.0,21.55,9.2,300.0,1021.9,100.0,9.9,220.0,0.8,1.0,10.0
max,65.7,65.7,56.4,99.72,0.268,100.0,2.4,3.23,59.7,28.3,360.0,1039.0,100.0,9.9,894.0,3.2,9.0,10.0


In [None]:
weather_df['datetime'] = pd.to_datetime(pd.to_datetime(weather_df['datetime']).dt.strftime("%Y-%m-%d %H:%M:%S"))


In [None]:
weather_df.head()

Unnamed: 0,name,datetime,temp,feelslike,dew,humidity,precip,precipprob,snow,snowdepth,...,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,severerisk,conditions,icon,stations
0,"New York, NY, United States",2023-01-01 00:00:00,54.0,54.0,51.8,92.43,0.009,100,0.0,0.0,...,1007.1,100.0,7.7,227,0.8,0,10,"Rain, Overcast",rain,"72505394728,KLGA,D2180,KNYC,72503014732"
1,"New York, NY, United States",2023-01-01 01:00:00,53.1,53.1,50.9,92.14,0.001,100,0.0,0.0,...,1007.2,100.0,7.2,228,0.8,0,10,"Rain, Overcast",rain,"72505394728,KLGA,D2180,KNYC,72503014732"
2,"New York, NY, United States",2023-01-01 02:00:00,53.3,53.3,50.0,88.43,0.004,100,0.0,0.0,...,1006.8,43.6,7.7,227,0.8,0,10,"Rain, Partially cloudy",rain,"72505394728,KLGA,D2180,KNYC,72503014732"
3,"New York, NY, United States",2023-01-01 03:00:00,52.6,52.6,49.7,89.82,0.0,0,0.0,0.0,...,1007.8,13.4,9.0,224,0.8,0,10,Clear,clear-night,"72505394728,KLGA,D2180,KNYC,72503014732"
4,"New York, NY, United States",2023-01-01 04:00:00,53.3,53.3,48.6,83.95,0.0,0,0.0,0.0,...,1008.1,7.2,9.9,222,0.8,0,10,Clear,clear-night,"72505394728,KLGA,D2180,KNYC,72503014732"


In [None]:
weather_df.dtypes

name                        object
datetime            datetime64[ns]
temp                       float64
feelslike                  float64
dew                        float64
humidity                   float64
precip                     float64
precipprob                   int64
snow                       float64
snowdepth                  float64
windgust                   float64
windspeed                  float64
winddir                      int64
sealevelpressure           float64
cloudcover                 float64
visibility                 float64
solarradiation               int64
solarenergy                float64
uvindex                      int64
severerisk                   int64
conditions                  object
icon                        object
stations                    object
dtype: object

In [None]:
nyc_data_eng['pickup_hour'] = nyc_data_eng['pickup_datetime'].dt.floor('H')