# Exercises: Functions

KATE expects your code to define variables with specific names that correspond to certain things we are interested in.

KATE will run your notebook from top to bottom and check the latest value of those variables, so make sure you don't overwrite them.

* Remember to uncomment the line assigning the variable to your answer and don't change the variable or function names.
* Use copies of the original or previous DataFrames to make sure you do not overwrite them by mistake.

You will find instructions below about how to define each variable.

Once you're happy with your code, upload your notebook to KATE to check your feedback.

In this notebook, you will create numerous functions, all of which have a single parameter `data`.

Running the code cell below will assign to `latest` an example of an argument suitable for passing to each function as the `data` parameter. You'll see after each incomplete function a code cell which will call that function using `latest`, so that you can check your function is working as expected. **Don't change the function names**.

The dataset, which relates to Coronavirus cases across the world, was taken from [worldmeters.info](https://www.worldometers.info/coronavirus/#countries) on February 19th 2020.

We have used the `pandas` package for convenience to import and process the dataset from the `corona.csv` file, which you can examine via Jupyter or a spreadsheet application if you want to.

There's no need to understand the `pandas` code cell yet, although feel free to read it and have a think about what it's likely to be doing; you'll learn more about that later. **Complete the subsequent exercises using Python only**.

In [4]:
import pandas as pd

df = pd.read_csv('data/corona.csv').fillna(0).astype(dtype=int, errors='ignore').sort_values(by='Total Cases', ascending=False)

latest = df.to_dict('list')

In [5]:
latest

`latest` is a dictionary, where each key is a column heading in the CSV, and each value is a list containing the values in the given column from each row of the CSV:

Run the following code cells further explore and understand the structure of the `latest` dictionary.

In [6]:
latest.keys()

We can therefore access the column and cell values as follows:

In [7]:
print(latest['Country'])

... and elements at a given position in all of the lists are from the same row of the CSV:

In [8]:
print(latest['Country'][0])
print(latest['Total Cases'][0])
print(latest['Total Deaths'][0])
print(latest['Recovered'][0])

In [9]:
print(latest['Country'][3])
print(latest['Total Cases'][3])
print(latest['Total Deaths'][3])
print(latest['Recovered'][3])

**When writing your functions, you can assume that the dataset will be ordered by `Total Cases`**, with the data for the countries highest number of cases coming first in each list. The number of rows in the CSV file may change, but the lengths of each resulting list (i.e. column) will always be the same as one another.

Your goal is to make a set of functions that can be re-used on any CSV file which is in the same format as `corona.csv` and as described above; thus if `corona.csv` were updated, all of your functions could be re-used to gather the same metrics as before.

We encourage you to re-use previous functions within other functions where possible.

**Q1.** Create a function which returns the worldwide number of reported cases, i.e. the sum of `Total Cases` from `latest` dictionary:

- Call the function `case_count()` which takes one parameter called `data`
- `data` parameter represents a dictionary similar to `latest`
- `data['Total Cases']` statement can be used to examine 'Total Cases' info
- You may find `sum()` function useful, to sum up values in `data['Total Cases']`


See below code syntax for some guidance:
```python
def case_count(data):
    total = sum(<statement>) 
    return total  
```

In [13]:
#add your code below
#def case_count(data):    

def case_count(data):
    total = sum(data['Total Cases']) 
    return total  

You can test your function using the following cell:

In [14]:
case_count(latest)

**Q2.** Create a function which returns the number of countries which have reported cases, i.e. the number of countries listed in `Country` from `latest` dictionary:

- Call the function `country_count()` which takes one parameter called `data`
- `data` parameter represents a dictionary similar to `latest`
- `data['Country']` statement can be used to examine 'Country' info
- You may find `len()` function useful, to calculate the number of countries in `data['Country']`


See below code syntax for some guidance:
```python
def country_count(data):
    countries_count = len(<statement>)
    return countries_count 
```

In [15]:
#add your code below
#def country_count(data):

def country_count(data):
    countries_count = len(data['Country'])
    return countries_count 



You can test your function using the following cell:

In [16]:
country_count(latest)

**Q3.** Create a function which returns the average number of cases over all listed countries:

- Call the function `average_cases()` which takes one parameter called `data`
- `data` parameter represents a dictionary similar to `latest`
- Use `case_count()` and `country_count()` functions as part of your working to calculate the average number of cases:i.e. 
```python
case_count(data)/country_count(data)
```


See below code syntax for some guidance:
```python
def average_cases(data):
    average = <calculation>
    return average
```

In [17]:
#add your code below
#def average_cases(data):


def average_cases(data):
    average = case_count(data)/country_count(data)
    return average


You can test your function using the following cell:

In [18]:
average_cases(latest)

**Q4.** Create a function which returns the number of countries where `Total Cases` equals `1`:

- Call the function `single_case_country_count()` which takes one parameter called `data`
- `data` parameter represents a dictionary similar to `latest`
- `data['Total Cases']` statement can be used to examine 'Total Cases' info


- Consider using a `for` loop to iterate through values in `Total Cases`
- Use an `if condition` within the `for` loop to check for `Total Cases` equivalent to one: `== 1`
- Also make sure to create a counter to track the number of countries matching the above criteria: `count = 0` 

See below code syntax for some guidance:
```python
def single_case_country_count(data):
    
    count = 0
    
    for cases_count in data['Total Cases']:
        if cases_count == 1:
            <calculation>
            
    return <calculation>
```

In [19]:
#add your code below
#def single_case_country_count(data):

def single_case_country_count(data):

    count = 0

    for cases_count in data['Total Cases']:
        if cases_count == 1:
            count += 1

    return count



You can test your function using the following cell:

In [20]:
single_case_country_count(latest)

**Q5.** Create a function which returns a list of Country names where the number of cases is equal to one:

Hint: you can use the `zip()` function in Python to iterate over two lists at the same time.

Please note you have been provided with the code for this question to carry out the necessary analysis. Simply uncomment the lines of code and run the code cell to produce the desired results.

In [22]:
#add your code below

def single_case_countries(data):
    countries = []
    for country, cases in zip(data["Country"], data["Total Cases"]):
        if cases == 1:
            countries.append(country)
    return countries




You can test your function using the following cell:

In [23]:
single_case_countries(latest)

**Q6.** Create a function which returns a list of countries in which there are still active cases, i.e. where `Total Cases` minus `Total Deaths` exceeds `Recovered`. You may find the `enumerate()` Python function helpful.


Please note you have been provided with the code for this question to carry out the necessary analysis. Simply uncomment the lines of code and run the code cell to produce the desired results.

In [24]:
#add your code below

def active_countries(data):
    countries = []
    for i, country in enumerate(data['Country']):
        if data['Total Cases'][i] - data['Total Deaths'][i] > data['Recovered'][i]:
            countries.append(country)
    return countries




You can test your function using the following cell:

In [25]:
active_countries(latest)

**Q7.** Create a function which returns a list of countries where there are no longer any active cases: i.e. where `Total Cases` minus `Total Deaths` equals `Recovered`. You may find the `enumerate()` Python function helpful.

Look at the above question for inspiration, follow a similar logic while creating your solution.

In [27]:
#add your code below

def cleared_countries(data):
    countries = []
    for i, country in enumerate(data['Country']):
        if data['Total Cases'][i] - data['Total Deaths'][i] == data['Recovered'][i]:
            countries.append(country)
    return countries


You can test your function using the following cell:

In [28]:
cleared_countries(latest)

## Further Study & Practice

If you have time, here are some additional resources to look at and exercises to try.

#### a) Python Style Guide

You may have started to consider how your code looks to others, or whether there are any conventions regarding things such as variable naming, line breaks, comments which you should be following.  

After having a look over the [PEP 8 Style Guide](https://www.python.org/dev/peps/pep-0008/), go through your code above and make any changes you think are appropriate. Consider whether it would help others or your future self to add any explanatory comments to aid understanding of what your code is doing.
  
#### b) Using Pandas

**Only if you have already started learning Pandas**, make a copy of this notebook and have a go at writing the functions again, using Pandas.  

To do so, in the top code cell change `df` to `latest` and remove the final line(`latest = df.to_dict('list')`). 

`latest` will now be a DataFrame, rather than a dictionary of lists. 

The subsequent code cells before the functions where the original `latest` is examined will not work, so delete them or replace the code with your own statements for examining the DataFrame.