# COVID-19 DATAPROJECT
## Data from John Hopkins Covid Project 

> Data Categories:
> FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incidence_Rate,Case-Fatality_Ratio

Things required:
- Want to show graph of cases vs date with each **country** as its own colour line
- Python graph and possibly power bi
- Data divided into subregions, county, Prov_state, Country... Need a loop to sum up sub regions for the day
- Confirmed, Deaths, Recovered are cumulative numbers
- calculate daily new number (subtract previous day from current loop day)
- How to store data? DB or file or memory or power bi? - may want PBI to make further analysis easier, but also want to use python
- Can read points and save them in a list, use lists to form xy coordinates (Date, country_confirmed or country_deaths or country_recovered})
    - first need to total country
- Steps
    1. open file
    2. If variable exist add to total
        - if variable doesn't exist create it
    3. read first line, get the date (flag if date different?), 
        - get country name, create variable country_confirmed etc. to hold totals
    4.  When end of file is reached store data as (date, variable) pairs


In [None]:
baseurl = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"


#from os import path
import os
import requests
from datetime import date, timedelta #for handling the date functions



def CreateDataDirectory():
  """
  Creates a directory called "Data" using the os library. First it checks if the directory exists using another function in the os library called path.isdir().  Since the function isdir() returns "true" if the directory exists, placing a "not" in front of the os.path.isdir means that it will activate the mkdir() creating the directory if the directory is missing
  
  """
  pc_path = 'C:/Users/USER/AnacondaProjects/EdgeUpPython/Data/'

  if not os.path.isdir(pc_path):
    os.mkdir(pc_path)

  #Activate the following to delete the directory within the python script
  #os.rmdir(path)

  #returns the path to be used in the main function
  return pc_path


  
#def GetCSV(baseurl: str, conv_loop_date: str):
def GetCSV(conv_loop_date: str):
  """
  Returns the csv text from the specific web location
  args: baseurl: provided
  loop_url: brings the converted date and appends the date as a file name on to the baseurl
  """

  baseurl = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"
  loop_url = baseurl + conv_loop_date + ".csv"
  
  #downloads the data from the assembled path/filename
  getfile = requests.get(loop_url)
  
  return getfile.text
  

def CleanCSVString(incoming_data:str):
  '''
  Cleans the CSV string since some of the retrieved files were encoded in the wrong format
  '''
  cleanup_data = incoming_data.encode().decode('utf-8-sig')
  cleanup_data = cleanup_data.replace('\r\n', '\n')

  return cleanup_data

  


def WriteCSVtoFolder(pc_path: str, loop_date: str, test: str):
  """
  Takes the data that has been downloaded for a given date, changes the file name back to YYYY-MM-DD format
  Args:
      pc_path: the directory and path that is created in CreateDataDirectory
      loop_date: date from the main function
      test: placeholder for the downloaded data
  Returns:
      Downloaded data
  """
  #builds the path and filename for the desination file
  outfile = pc_path + str(loop_date) + '.csv'

  csvdatafile = outfile
  #opens the 
  with open (outfile, 'w') as f:
    f.write(test)
    #print('write complete')
  return csvdatafile


def main():
  """
  main function. Contains a for loop to iterate through a range of dates.  The loop calls on 
  CreateDataDirectory to create a new folder, GetCSV to pull a datafile off a website, CleanCSVString to correct for errors in the files
  Write the files to the created folder.
  """
#Calls on CreateDataDirectory to create the folder if it doesn't exist for the future files
pc_path = CreateDataDirectory()

# The size of each step in days for the item to iterate on.
day_delta = timedelta(days=1)

#defining the date interval as specified in the instructions
start_date = date(2020,3,1)
end_date = start_date + 21*day_delta

for filedate in range((end_date - start_date).days):
  '''
  Uses a range of dates, calling on datetime.timedelta to iterate the for loop over
  a range of days.  The loop_date is re-organized to satisfy the requirements of the datasource.
  Args: start_date: beginning of the requested data period
        end_date: calculated end date based on start date and a number of following days


  '''
  #increments the day counter for each iteration of the loop
  loop_date = start_date + filedate*day_delta

  #re-organizes the loop_date to match the format in the files on the website
  conv_loop_date = loop_date.strftime('%m-%d-%Y')
  
  #begins the call out to the different functions for the given date, writes the cleaned file
  file_request = GetCSV(conv_loop_date)
  file_cleanup = CleanCSVString(file_request)
  write_csvfile = WriteCSVtoFolder(pc_path, loop_date, file_cleanup)
  


if __name__ == "__main__":
  main()
