[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github.com/onedataengineer21/notebooks/blob/main/python/Exploring%20Chicago%20Crimes%20Data%202022.ipynb)



```
# This is formatted as code
```

Covid 19 was the biggest pandemic which was faced by the people for over a century. We again proved that humans cannot come together to solve a problem quickly. Millions of people lost their lives and livelihood and yet we all are living like nothing happened without being grateful for being alive today.

In [37]:
!python -m wget https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-2020.csv
!python -m wget https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-2021.csv
!python -m wget https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-2022.csv
!python -m wget https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-2023.csv

100% [....................................................] 35871900 / 35871900
Saved under us-counties-2020 (4).csv
100% [....................................................] 50311433 / 50311433
Saved under us-counties-2021 (4).csv
100% [....................................................] 51200840 / 51200840
Saved under us-counties-2022 (4).csv
100% [....................................................] 11541836 / 11541836
Saved under us-counties-2023 (4).csv


In [38]:
## importing the packages
import pandas as pd
import os
from pathlib import Path
import duckdb
import numpy as np

# Get the current working directory
current_working_directory = os.getcwd()

# Convert the current working directory to a Path object
script_dir = Path(current_working_directory)

### Extracting the datasets of the US counties from the NY times github repo


* We are pulling datasets of the US counties from 2020 to 2023
* Each of these datasets are in csv format

In [39]:
def extract_data(dataset_path):
    """
    Extract the covid datasets from the csv file into dataframe and return the dataframe

    Parameters
    ----------
    dataset_path : str
        Name of the path and the file name 
    """
    try:
        covid = pd.read_csv(dataset_path)
    except FileNotFoundError:
        print("File not found.")
    except pd.errors.EmptyDataError:
        print("No data")
    except pd.errors.ParserError:
        print("Parse error")
    return covid

### Transform the dataset

* Merged all the datasets from 2020 to 2023.
* Add new columns to the dataset - DailyCases and DailyDeaths
* Filter the data to a given state

In [40]:
def transform(df_list, statename):
    """
    Transform the dataframes by concatinating all the given dataframes
    Filter the dataframe for the given statename
    Create new columns - DailyCases and DailyDeaths
    Return the transformed dataframe

    Parameters
    ----------
    df_list : list
        list of dataframe names
    statename : str
        Name of the state to be used to filter the records
    """
    ### Concating the dataframes into one single dataframe
    covid = pd.concat(df_list)

    ### Filtering the dataframe for the given state
    covid = covid[covid.state == statename]

    ### Adding Daily cases and Daily Deaths
    covid = covid[covid.state == statename].sort_values(by=['county', 'date'])
    covid['DailyCases'] = covid['cases'].diff().fillna(0).astype('Int64')
    covid['DailyDeaths'] = covid['deaths'].diff().fillna(0).astype('Int64')

    return covid

### Loading the data to csv files

In [41]:
def load(data, name):
    """
    Load the dataset in the csv format to the path given along with the file name

    Parameters
    ----------
    data : dataframe
        name of the transformed dataframe
    name : str
        Name of the state along with the path where to write the csv file
    """
    data.to_csv(name + ".csv")

In [42]:
def generate_covidreport_statewise(statename):
    """
    """
    try:
        print(f"Generating the report for the state ::: {statename}")
        ##extracting the data
        covid2020 = extract_data("us-counties-2020.csv")
        covid2021 = extract_data("us-counties-2021.csv")
        covid2022 = extract_data("us-counties-2022.csv")
        covid2023 = extract_data("us-counties-2023.csv")
 
        ##transforming the data
        covid = transform([covid2020, covid2021, covid2022, covid2023], statename)
        
        ##loading the dataset
        load(covid, "data/output/covid/" + statename)
        print(f"Report generation completed")
    except Exception:
      print("Exception in the report generation")
      print(exception)
    
    

In [44]:
state_names = ["Alaska", "Alabama", "Arkansas", "American Samoa", "Arizona", "California", "Colorado", "Connecticut", "District ", "of Columbia", "Delaware", "Florida", "Georgia", "Guam", "Hawaii", "Iowa", "Idaho", "Illinois", "Indiana", "Kansas", "Kentucky", "Louisiana", "Massachusetts", "Maryland", "Maine", "Michigan", "Minnesota", "Missouri", "Mississippi", "Montana", "North Carolina", "North Dakota", "Nebraska", "New Hampshire", "New Jersey", "New Mexico", "Nevada", "New York", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Virginia", "Virgin Islands", "Vermont", "Washington", "Wisconsin", "West Virginia", "Wyoming"]
for state in state_names:
    generate_covidreport_statewise(state)

Generating the report for the state ::: Alaska
Report generation completed
Generating the report for the state ::: Alabama
Report generation completed
Generating the report for the state ::: Arkansas
Report generation completed
Generating the report for the state ::: American Samoa
Report generation completed
Generating the report for the state ::: Arizona
Report generation completed
Generating the report for the state ::: California
Report generation completed
Generating the report for the state ::: Colorado
Report generation completed
Generating the report for the state ::: Connecticut
Report generation completed
Generating the report for the state ::: District 
Report generation completed
Generating the report for the state ::: of Columbia
Report generation completed
Generating the report for the state ::: Delaware
Report generation completed
Generating the report for the state ::: Florida
Report generation completed
Generating the report for the state ::: Georgia
Report generation 