# LAB 2:  Campaign Financial Contributions [Total: 8 points]

## Instructions

The purpose of this assignment is for you to engage with a concrete data cleaning task. This will be accomplished through a coding assignment. You will carry out this task in the present notebook, and use the notebook to document the various steps of the exercise and to answer all questions.

In particular, we will be cleaning up some campaign donation data. The DC government makes data about campaign contribution (i.e. donations) available online to the general public on its website [opendata.dc.gov](https://opendata.dc.gov). The website does not provide very much documentation about the data.

We have created a smaller sample (`Campaign_Financial_Contributions_Sample.csv`) of this data for you to explore and clean. Follow the instructions in each question, and answer the accompanying questions. Note that some of the questions will require you to write a Python function to compute the correct answer.


## Important Warning
* Please ensure that you run the cell below before running any others. This will download all required files and packages, as well as installing the necessary packages to ensure the code runs successfully. If you restart the kernel or your runtime session (in Colab), be sure to rerun this cell before running any others.
* This assignment recommends using **Google Colab**. If you are using Anaconda Jupyter notebook/lab, please ensure that **this notebook is kept in a new folder**. This is because the following commands will delete all files with the extensions .csv and .py before downloading the required files.

In [None]:
required_files = "https://github.com/mainuddin-rony/inst447-fall2024/raw/main/assignment/lab/lab2/required_files.zip"
! rm -rf tests
! rm -f required_files.zip *.csv *.py
! wget $required_files && unzip -j required_files.zip
! mkdir tests && mv *.py tests
! pip install otter-grader==5.5.0

In [None]:
# Initialize Otter
import otter
grader = otter.Notebook()

# Part A: Inspecting the data frame

The first set of questions will ask you to inspect the data. Feel free to add new code cells and make sure to comment your code with markdown cells to describe your process of reaching the answer.

## Q1 (Manual Grading)

**Points:** 1

**Question:** Read the sample campaign contribution data into a pandas data frame. Inspect the columns and cells of the data set. What is most likely the column name of the variable which has the method of payment in which the donation/contribution was made?

1. `GIS_LAST_MOD_DTTM`

2. `DATEOFRECEIPT`

3. `CONTRIBUTIONTYPE`

Write down the correct answer in the following cell.

## Q2 (Auto Grading)

**Points:** 1

**Question:** How many donations (rows) in this dataset are from corporations?

Write a function called `numdoncorp()` the function should take as parameter the name of the file with data and it should return the answer to the question as an int.

In [None]:
import pandas as pd

def numdoncorp(path):
...

In [None]:
numdoncorp("Campaign_Financial_Contributions_Sample.csv")

When you're ready run the cell below to get feedback on your answer.

In [None]:
grader.check("q2")

## Q3 (Manual Grading)

**Points:** 1

**Question:** True or False: The campaign contribution data set is structured data.

1. True
2. False

Write down the answer in the following cell.

# Part B: cleaning inconsistent records

The next set of questions will ask you to clean the dataframe before computing the desired value.

## Q4 (Auto Grading)

**Points:** 1

**Question:**  What is the minimum donation amount (do not round)? Note that in this an the following questions, any donation amounts of 0 are likely mistakes and are NOT acceptable answers because donors cannot give a $0 donation. Since there are some donations with 0 amount, filter these rows.

Write a function called `findLeastAmount()`. the function should take as parameter the name of the file with data and it should return the exact value and not round off value as the answer.

In [None]:
import pandas as pd

def findLeastAmount(path):
...

In [None]:
findLeastAmount("Campaign_Financial_Contributions_Sample.csv")

When you're ready run the cell below to get feedback on your answer.

In [None]:
grader.check("q4")

## Q5 (Auto Grading)

<div class="alert alert-warning">Make sure to apply the same cleaning step from Q4</div>

**Points:** 1

**Question:** What is the maximum donation amount?

Write a function called `findMaxAmount()`. the function should take as parameter the name of the file with data and it should return the exact value and not round off value as the answer.

In [None]:
import pandas as pd

def findMaxAmount(path):
...

In [None]:
findMaxAmount("Campaign_Financial_Contributions_Sample.csv")

When you're ready run the cell below to get feedback on your answer.

In [None]:
grader.check("q5")

## Q6 (Auto Grading)

<div class="alert alert-warning">Make sure to apply the same cleaning step from Q4</div>

**Points:** 1

**Question:** What is the most recent date listed in the column for DATEOFRECEIPT?

Write a function called `findRecentDate()`. the function should take as parameter the name of the file with data and it should return a Python string in the form `YYYY-MM-DD`.

In [None]:
import pandas as pd

def findRecentDate(path):
...

In [None]:
findRecentDate("Campaign_Financial_Contributions_Sample.csv")

When you're ready run the cell below to get feedback on your answer.

In [None]:
grader.check("q6")

## Q7 (Auto Grading)

<div class="alert alert-warning">Make sure to apply the same cleaning step from the Q4</div>

**Points: 1**

**Question:** How many missing values are there for contributor type (CONTRIBUTORTYPE)?

In [None]:
import pandas as pd

def findMissingValues(path):
...

In [None]:
findMissingValues("Campaign_Financial_Contributions_Sample.csv")

When you're ready run the cell below to get feedback on your answer.

In [None]:
grader.check("q7")

## Submission

Don't forget to run all cells in your notebook and then save it. To save, click on `File`, then select `Save/Save Notebook`. After that, download the notebook by going to `File`, then `Download` (for Anaconda notebook), and choosing `Download .ipynb` (for Colab). Finally, submit the notebook on Gradescope using the link found on ELMS.