## File I/O in Python

Reading and writing files in Python is a breeze. The immense power comes from various open source libraries that have been develped by programming enthusiasts for past 3 decades. In this exercise, we will learn how to read and write to .csv file.

1. Datasets  
Let's download COVID-19 datasets from the web.
https://datahub.io/core/covid-19


Data is available in both .csv and .json format. We will be using .csv file for our analysis.
We are using these two files:
* time-series-19-covid-combined_csv.csv
* key-countries-pivoted_csv.csv

In [4]:
# Importing basic libraries
import numpy as np
from matplotlib import pyplot as plt
import csv

In [2]:
file_tseries = "../data/time-series-19-covid-combined_csv.csv"
file_keycountries = "../data/key-countries-pivoted_csv.csv"

Before importing data from the file, first have a look at the fields in any text reader. Alternatively, you can open the file in excel and see various fields in spreadsheet format.

![Time Series Covid-19 Combined data](tseries.png)

In [88]:
line_count = 0
date = []
country = []
state = []
lat = []
long = []
confirmed = []
recovered = []
deaths = []

with open(file_tseries) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=",")
    for row in csv_reader:
        if line_count != 0:
            date.append(row[0])
            country.append(row[1])
            state.append(row[2])
            lat.append(row[3])
            long.append(row[4])
            if row[5]=='':
                row[5] = '0'
            confirmed.append(row[5])
            if row[6]=='':
                row[6] = '0'
            recovered.append(row[6])
            if row[7]=='':
                row[7] = '0'
            deaths.append(row[7])
        line_count += 1

Having read the data, let's see what all is possible now!!!
1. Find unique countries in the list. *(NOTE: It is not that straightforward for list. Try doing it.)*
2. Find the country with the highest number of total confirmed cases.
3. Find the country with the highest number of total deaths.
5. Plot confirmed cases for top 20 countries.
4. Plot confirmed, recovered, and deaths w.r.t time for India.

Most of these operations are much easier using a Numpy array.  
Hence, we generally prefer numpy for our work!  
**Speed** is the added advantage one gains with Numpy.

### Finding unique countries

In [77]:
np_countries = np.array(country)
unique_countries = np.unique(np_countries)

### Data needs to be segregated to work on other three tasks.

In [94]:
# Calculating total confirmed cases for each country
cnfrmd_cases_cntrywise = dict.fromkeys(unique_countries,0)
deaths_cntrywise = dict.fromkeys(unique_countries, 0)
cntries_with_states = []

for i in np.arange(len(date)):
    if state[i] == '':
        cnfrmd_cases_cntrywise[country[i]] = np.int(confirmed[i])
        deaths_cntrywise[country[i]] = np.int(deaths[i])
    else:
        cntries_with_states.append(country[i])
        
# There is an issue here, there are some countries for which statewise distribution is given.
# Identify those countries. Then count total number of confirmed cases and death in those
# countries.
cntries_with_states = np.unique(cntries_with_states)

# Calculate the required values for these countries.
        


In [96]:
# Highest number of total deaths and confirmed cases.

array(['Australia', 'Canada', 'China', 'Denmark', 'France', 'Netherlands',
       'United Kingdom'], dtype='<U14')

In [108]:
# Let's sort our dictionary here
sorted(cnfrmd_cases_cntrywise.items())
#plt.bar(cnfrmd_cases_cntrywise.keys(), cnfrmd_cases_cntrywise.values())

[('Afghanistan', 444),
 ('Albania', 400),
 ('Algeria', 1572),
 ('Andorra', 564),
 ('Angola', 19),
 ('Antigua and Barbuda', 19),
 ('Argentina', 1715),
 ('Armenia', 881),
 ('Australia', 0),
 ('Austria', 12942),
 ('Azerbaijan', 822),
 ('Bahamas', 40),
 ('Bahrain', 823),
 ('Bangladesh', 218),
 ('Barbados', 63),
 ('Belarus', 1066),
 ('Belgium', 23403),
 ('Belize', 8),
 ('Benin', 26),
 ('Bhutan', 5),
 ('Bolivia', 210),
 ('Bosnia and Herzegovina', 804),
 ('Botswana', 6),
 ('Brazil', 16170),
 ('Brunei', 135),
 ('Bulgaria', 593),
 ('Burkina Faso', 414),
 ('Burma', 22),
 ('Burundi', 3),
 ('Cabo Verde', 7),
 ('Cambodia', 117),
 ('Cameroon', 730),
 ('Canada', 0),
 ('Central African Republic', 8),
 ('Chad', 10),
 ('Chile', 5546),
 ('China', 0),
 ('Colombia', 2054),
 ('Congo (Brazzaville)', 45),
 ('Congo (Kinshasa)', 180),
 ('Costa Rica', 502),
 ("Cote d'Ivoire", 384),
 ('Croatia', 1343),
 ('Cuba', 457),
 ('Cyprus', 526),
 ('Czechia', 5312),
 ('Denmark', 5402),
 ('Diamond Princess', 712),
 ('Djibout

In [110]:
?base.sorted()

Object `base.sorted()` not found.


In [114]:
%load_ext watermark

# python, ipython, packages, and machine characteristics
%watermark -v -m -p numpy,matplotlib,csv

# date
print (" ")
%watermark -u -n -t -z

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
CPython 3.7.4
IPython 7.8.0

numpy 1.17.2
matplotlib 3.1.1
csv 1.0

compiler   : Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 19.2.0
machine    : x86_64
processor  : i386
CPU cores  : 4
interpreter: 64bit
 
last updated: Sat Apr 11 2020 20:52:25 CEST
