### ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png)

# Unit 5 Lab: Importing Libraries and Pandas

## Overview

Welcome to the Unit 5 lab!

Now that we've learned a little about libraries, we can make some changes to our weather forecast application to make it more efficient and flexible.

### Goals

In this lab, you will:

- Utilize external libraries to extend the functionality of our `Forecast` class.
- Work with and manipulate datetime objects to provide more accurate weekly forecast data.

---

### Adding the `datetime` and `calendar` Libraries
Let's apply the power of the `datetime` and `calendar` libraries to our application. Start by adding an import for the `datetime` library.

To save the user from an unnecessary input, we're going to set the current year for them. 

- Create an instance variable named `forecast_year` inside the `__init__()` function and set the value to `datetime.datetime.now().year`.

_Note: The `datetime.datetime.now()` method creates a `datetime` object for the current execution time. The result looks like `2019-06-13 18:48:30.733505`. Once we've created a `datetime` object, we can grab any attribute from it that we need, such as `.year`, `.month`, etc._

- Add two more arguments to our `__init__()` method:
  - `user_input_month`
  - `user_input_day`
  
Create an instance variable, `forecast_date`, inside the `__init__()` method.
 
Set it to a new `datetime` object using `self.forecast_year`, `user_input_month`, and `user_input_day`.

_Hint: To create a `datetime` object, use `datetime.datetime(year,month,day)`._

In [1]:
import datetime

one_day_of_hourly_temperatures = [67,67,68,69,71,73,75,76,79,81,81,80,82,81,81,80,78,75,72,70,67,65,66,66]
one_day_of_hourly_humidity = [60,65,65,70,70,70,70,75,75,75,75,80,80,85,85,85,85,80,80,80,80,80,80,80]
one_day_of_hourly_rainfall = [0,0,0,0.1,0.1,0.05,0.1,0.15,0.2,0.3,0.3,0.5,0,0,0,0,0,0,0,0,0,0,0,0]

class Forecast():
    
    def __init__(self, location, user_input_month, user_input_day):
        self.location = location
        self.forecast_year = datetime.datetime.now().year
        self.forecast_date = datetime.datetime(self.forecast_year, user_input_month, user_input_day)
    
    def __get_daily_high(self):
        return max(one_day_of_hourly_temperatures)
    
    def __get_daily_low(self):
        return min(one_day_of_hourly_temperatures)
    
    def __get_daily_chance_of_rain(self):
        number_of_years_of_data = 10
        times_it_has_rained = 0
        
        if sum(one_day_of_hourly_rainfall):
            times_it_has_rained += 1
            
        return times_it_has_rained / number_of_years_of_data * 100
    
    def display_daily_forecast(self):
        print(f"The weather forecast for today in "
              f"{self.location} is: High of {self.__get_daily_high()}"
              f", Low of {self.__get_daily_low()}, with a "
              f"{self.__get_daily_chance_of_rain()}% chance of rain.")
    
    def display_weekly_forecast(self):
        print(f'The week\'s weather forecast for: \
            \n   Monday:    High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Tuesday:   High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Wednesday: High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Thursday:  High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Friday:    High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Saturday:  High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Sunday:    High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%')
        
test = Forecast("Austin,TX", 6,13)
test.display_daily_forecast()

### Update to Display the Daily Forecast

Now it's time to display the dates.

1. Add an import for the `calendar` library.
2. Create a new _private_ method called `__get_day_of_week()`. _(Note: Because this method is only used within the class, it's prefixed by two underscores)._ This method will accept the integer arguments `month` and `day`. It will return a string such as `'Monday'` for any date we send it.

_Hint: The `calendar.day_name`  property can be indexed to return string representation of the integer day of the week. For example, `calendar.day_name[0]` returns `'Monday'` and `calendar.day_name[6]` returns `'Sunday'`._

_Hint: The `datetime.datetime.weekday()` method returns the integer of the given `datetime`. For example, `datetime.datetime(1985, 1, 22).weekday()` returns `1`, which is a Tuesday._

3. Add the `__get_day_of_week()` result to your `display_daily_forecast()` method.
4. Call this method with `self.forecast_date.month` and `self.forecast_date.day`.

The output of calling `display_daily_forecast()` should look like this:
  
> The weather forecast for Tuesday 12/31 in Austin, TX is: High of 82, Low of 65, with a 10.0% chance of rain.

In [2]:
import datetime
import calendar

one_day_of_hourly_temperatures = [67,67,68,69,71,73,75,76,79,81,81,80,82,81,81,80,78,75,72,70,67,65,66,66]
one_day_of_hourly_humidity = [60,65,65,70,70,70,70,75,75,75,75,80,80,85,85,85,85,80,80,80,80,80,80,80]
one_day_of_hourly_rainfall = [0,0,0,0.1,0.1,0.05,0.1,0.15,0.2,0.3,0.3,0.5,0,0,0,0,0,0,0,0,0,0,0,0]

class Forecast():
    
    def __init__(self, location, user_input_month, user_input_day):
        self.location = location
        self.forecast_year = datetime.datetime.now().year
        self.forecast_date = datetime.datetime(self.forecast_year, user_input_month, user_input_day)
    
    def __get_day_of_week(self, month, day):
        return calendar.day_name[datetime.datetime(self.forecast_year, month, day).weekday()]
    
    def __get_daily_high(self):
        return max(one_day_of_hourly_temperatures)
    
    def __get_daily_low(self):
        return min(one_day_of_hourly_temperatures)
    
    def __get_daily_chance_of_rain(self):
        number_of_years_of_data = 10
        times_it_has_rained = 0
        
        if sum(one_day_of_hourly_rainfall):
            times_it_has_rained += 1
            
        return times_it_has_rained / number_of_years_of_data * 100
    
    def display_daily_forecast(self):
        day_of_week = self.__get_day_of_week(self.forecast_date.month, self.forecast_date.day)
        print(f"The weather forecast for {day_of_week} "
              f"{self.forecast_date.month}/{self.forecast_date.day}"
              f" in {self.location} is:"
              f"High of {self.__get_daily_high()}, Low of {self.__get_daily_low()}"
              f", with a {self.__get_daily_chance_of_rain()}% chance of rain.")
    
    def display_weekly_forecast(self):
        print(f'The week\'s weather forecast for: \
            \n   Monday:    High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Tuesday:   High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Wednesday: High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Thursday:  High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Friday:    High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Saturday:  High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%\
            \n   Sunday:    High {self.__get_daily_high()}, Low {self.__get_daily_low()}, Rain {self.__get_daily_chance_of_rain()}%')
        
test = Forecast("Austin,TX",12,31)
test.display_daily_forecast()

### Update to Display the Weekly Forecast

- Update the `display_weekly_forecast()` function by reading out the forecast with each day of the forecast listed in both words and numbers. You'll do this with the `datetime.timedelta()` function. Here's an example:

```python
if current_date = datetime.datetime.now()
    tomorrows_date = current_date + datetime.timedelta(day=1)
    two_days_from_current_date = current_date + datetime.timedelta(day=2)
    three_days_from_current_date = current_date + datetime.timedelta(day=3)
    ...
```

_Hint: Use an iterator to print out all of these unique statements._


- The output of `display_weekly_forecast()` should look like:
```
The week's weather forecast for:
    Tuesday, 12/31: High 82, Low 65, Rain 10.0%
    Wednesday, 1/1: High 82, Low 65, Rain 10.0%
    Thursday, 1/2: High 82, Low 65, Rain 10.0%
    Friday, 1/3: High 82, Low 65, Rain 10.0%
    Saturday, 1/4: High 82, Low 65, Rain 10.0%
    Sunday, 1/5: High 82, Low 65, Rain 10.0%
    Monday, 1/6: High 82, Low 65, Rain 10.0%
```

In [2]:
import datetime
import calendar

one_day_of_hourly_temperatures = [67,67,68,69,71,73,75,76,79,81,81,80,82,81,81,80,78,75,72,70,67,65,66,66]
one_day_of_hourly_humidity = [60,65,65,70,70,70,70,75,75,75,75,80,80,85,85,85,85,80,80,80,80,80,80,80]
one_day_of_hourly_rainfall = [0,0,0,0.1,0.1,0.05,0.1,0.15,0.2,0.3,0.3,0.5,0,0,0,0,0,0,0,0,0,0,0,0]

class Forecast():
    
    def __init__(self, location, user_input_month, user_input_day):
        self.location = location
        self.forecast_year = datetime.datetime.now().year
        self.forecast_date = datetime.datetime(self.forecast_year,user_input_month,user_input_day)
    
    def __get_day_of_week(self, date):
        return calendar.day_name[date.weekday()]
    
    def __get_daily_high(self):
        return max(one_day_of_hourly_temperatures)
    
    def __get_daily_low(self):
        return min(one_day_of_hourly_temperatures)
    
    def __get_daily_chance_of_rain(self):
        number_of_years_of_data = 10
        times_it_has_rained = 0
        
        if sum(one_day_of_hourly_rainfall):
            times_it_has_rained += 1
            
        return times_it_has_rained / number_of_years_of_data * 100
    
    def display_daily_forecast(self):
        day_of_week = self.__get_day_of_week(self.forecast_date)
        print(f"The weather forecast for {day_of_week} "
              f"{self.forecast_date.month}/{self.forecast_date.day}"
              f" in {self.location} is: High of "
              f"{self.__get_daily_high()}, Low of {self.__get_daily_low()}"
              f", with a {self.__get_daily_chance_of_rain()}% chance of rain.")
    
    def display_weekly_forecast(self):
        print("The week\'s weather forecast for:")
              
        for i in range(7):
            current_date = self.forecast_date + datetime.timedelta(days=i)
            print(f"   {self.__get_day_of_week(current_date)},"
                  f"{current_date.month}/{current_date.day}: "
                  f"High {self.__get_daily_high()}, Low {self.__get_daily_low()},"
                  f" Rain {self.__get_daily_chance_of_rain()}%")

test = Forecast("Austin,TX",12,31)
test.display_weekly_forecast()

The week's weather forecast for:
   Tuesday,12/31: High 82, Low 65, Rain 10.0%
   Wednesday,1/1: High 82, Low 65, Rain 10.0%
   Thursday,1/2: High 82, Low 65, Rain 10.0%
   Friday,1/3: High 82, Low 65, Rain 10.0%
   Saturday,1/4: High 82, Low 65, Rain 10.0%
   Sunday,1/5: High 82, Low 65, Rain 10.0%
   Monday,1/6: High 82, Low 65, Rain 10.0%


### Adding Real Weather Data

- Import the Pandas library.
- Read in the following four columns from the `raw_weather_data.csv` file, located in `./data`. Place the resulting DataFrame into a variable named `hourly_weather_data`.
  - `DATE`
  - `REPORT_TYPE`
  - `HourlyDryBulbTemperature`
  - `HourlyPrecipitation`
- Print the shape of the DataFrame.
- Print the first 10 rows of the DataFrame.

In [4]:
import pandas as pd

hourly_weather_data = pd.read_csv(
    './data/raw_weather_data.csv', 
    usecols=['DATE','REPORT_TYPE',
             'HourlyDryBulbTemperature',
             'HourlyPrecipitation'],
    low_memory=False
)

print(hourly_weather_data.shape)
print(hourly_weather_data.head(10))


(128130, 4)
                  DATE REPORT_TYPE HourlyDryBulbTemperature  \
0  2009-01-01T00:00:00       FM-12                       47   
1  2009-01-01T00:53:00       FM-15                       37   
2  2009-01-01T01:53:00       FM-15                       35   
3  2009-01-01T02:53:00       FM-15                       33   
4  2009-01-01T03:53:00       FM-15                       36   
5  2009-01-01T04:53:00       FM-15                       35   
6  2009-01-01T05:53:00       FM-15                       33   
7  2009-01-01T06:00:00       FM-12                       39   
8  2009-01-01T06:53:00       FM-15                       32   
9  2009-01-01T07:53:00       FM-15                       32   

  HourlyPrecipitation  
0                 NaN  
1                0.00  
2                0.00  
3                0.00  
4                0.00  
5                0.00  
6                0.00  
7                 NaN  
8                0.00  
9                0.00  


### Observing Data Types

- Print the `dtypes` attribute of the DataFrame.
- How is our `DATE` column being stored?

In [5]:
print(hourly_weather_data.dtypes)

DATE                        object
REPORT_TYPE                 object
HourlyDryBulbTemperature    object
HourlyPrecipitation         object
dtype: object


### Parsing Dates

Our `DATE` column was brought in as an object (string) type because Pandas doesn't know how to interpret a cell filled with numbers and strings, so it defaults to the more general string type. 

In Lab 6, we'll need to be able to regroup our data based on different time intervals. To do this, we need to convert all of the dates in the `DATE` column (string/object type) into `datetime` objects. 

We can accomplish this by adding the `parse_dates=["DATE"]` parameter to our `pd.read_csv()` function.

- Run the `pd.read_csv()` function again with the added `parse_dates` parameter. 
- Print out the `dataframe.dtypes` and inspect the result. Is it different?
- Print out the first 10 rows of the DataFrame again.

In [6]:
hourly_weather_data = pd.read_csv(
    './data/raw_weather_data.csv', 
    usecols=['DATE','REPORT_TYPE',
             'HourlyDryBulbTemperature',
             'HourlyPrecipitation'],
    parse_dates=['DATE'],
    low_memory=False
)

print(hourly_weather_data.dtypes)
print(hourly_weather_data.head(10))

DATE                        datetime64[ns]
REPORT_TYPE                         object
HourlyDryBulbTemperature            object
HourlyPrecipitation                 object
dtype: object
                 DATE REPORT_TYPE HourlyDryBulbTemperature HourlyPrecipitation
0 2009-01-01 00:00:00       FM-12                       47                 NaN
1 2009-01-01 00:53:00       FM-15                       37                0.00
2 2009-01-01 01:53:00       FM-15                       35                0.00
3 2009-01-01 02:53:00       FM-15                       33                0.00
4 2009-01-01 03:53:00       FM-15                       36                0.00
5 2009-01-01 04:53:00       FM-15                       35                0.00
6 2009-01-01 05:53:00       FM-15                       33                0.00
7 2009-01-01 06:00:00       FM-12                       39                 NaN
8 2009-01-01 06:53:00       FM-15                       32                0.00
9 2009-01-01 07:53:00   

- Notice that the data type for the `DATE` column is now listed as `datetime64[ns]` instead of just `object` (string).
- Notice that all of the dates in the `DATE` column of our DataFrame have been formatted with a space between the date and time.

### Setting Our Index

Now we need to tell Pandas that we want to index all of our data by the `DATE` column rather than by the row number. We can also do this in our `pd.read_csv()` function by adding another parameter, `index_col="DATE"`.

- Run the `pd.read_csv()` function again with both the added `parse_dates=["DATE"]` and `index_col="DATE"` parameters. 
- Print out the first 10 rows of the DataFrame.

In [7]:
hourly_weather_data = pd.read_csv(
    './data/raw_weather_data.csv', 
    usecols=['DATE','REPORT_TYPE',
             'HourlyDryBulbTemperature',
             'HourlyPrecipitation'],
    parse_dates=['DATE'],
    index_col='DATE',
    low_memory=False
)

print(hourly_weather_data.head(10))

                    REPORT_TYPE HourlyDryBulbTemperature HourlyPrecipitation
DATE                                                                        
2009-01-01 00:00:00       FM-12                       47                 NaN
2009-01-01 00:53:00       FM-15                       37                0.00
2009-01-01 01:53:00       FM-15                       35                0.00
2009-01-01 02:53:00       FM-15                       33                0.00
2009-01-01 03:53:00       FM-15                       36                0.00
2009-01-01 04:53:00       FM-15                       35                0.00
2009-01-01 05:53:00       FM-15                       33                0.00
2009-01-01 06:00:00       FM-12                       39                 NaN
2009-01-01 06:53:00       FM-15                       32                0.00
2009-01-01 07:53:00       FM-15                       32                0.00



Notice that there is no longer a numerical row index. This is because Pandas is now using our `datetime` objects as the index. This allows us to use more advanced slicing functionality based on our date, along with `datetime` JOINs to other DataFrames.