# Worksheet 12

## MCS 260 Fall 2020 - Emily Dumas

## Problem 1 (pytest)

Here is a link to the `geom.py` module we developed in a previous lecture.  It allows you to represent Circles, Rectangles, and Squares in the plane and to perform some operations on them.

* [geom.py](https://raw.githubusercontent.com/emilydumas/mcs260fall2020/master/samplecode/geom/geom.py)

Download this file and save it in its own folder (e.g. `geomtesting` would be a good name for the folder).

Now, create a file `test_geom.py` in the same folder that contains at least 4 tests for `geom.py` in the format expected by pytest.  That is, each test should be in a function whose name begins with `test_`.  Running this script with the python interpreter should do nothing, since it should only declare test functions, and should not call them.

Run pytest and confirm that all of your tests are discovered, and that they all pass.

Examples of suggested tests:
* two Circle objects created to have the same center an radius compare as equal
* two Rectangle objects created to have the same center, width, and height compare as equal
* using the `.scale()` method of a specific Rectangle you choose for testing purposes gives the same results as what you expect based on computing by hand.
* adding two example rectangles gives the same bounding box as you compute by hand.

Note that the equality test for objects in `geom.py` is very fragile, because it tests floats for equality.  One way to avoid problems with floating point error is to choose examples where every value involved (width, height, center coordinates, radius, scale factor, etc.) is actually an integer.

## Prep for problems 2-5: Downloading data

The rest of the worksheet requires data files you need to download.

Download the following files and put them in a place where you can easily open them with the programs you'll write in this worksheet:

* [inttable.csv](https://dumas.io/teaching/2020/fall/mcs260/sampledata/inttable.csv)
* [uicscorecard.json](https://dumas.io/teaching/2020/fall/mcs260/sampledata/uicscorecard.json)

Now, confirm you can open this file in a Python script.  Modify the code below as needed, either saving it to a file in the same directory as the data file downloaded above, or changing the filename in the code.

In [None]:
"""Check that the worksheet 12 data files are accessible"""
# To use this, save it to a script file on your computer.
# Then run it from a terminal.
import os

# MAY NEED TO CHANGE THIS depending on where the CSV file is located:
csv_filename = "inttable.csv"
# e.g. might need something like csv_filename = r"C:\Users\sramanaj\Downloads\inttable.csv"

if os.path.exists(csv_filename):
    print("CSV file found successfully")
else:
    print("PROBLEM: CSV file not found.  Either change `csv_filename` to include the path or move this script to another directory.")

# MAY NEED TO CHANGE THIS depending on where the JSON file is located:
json_filename = "uicscorecard.json"
# e.g. might need something like json_filename = r"C:\Users\sramanaj\Downloads\uicscorecard.json"

if os.path.exists(csv_filename):
    print("JSON file found successfully")
else:
    print("PROBLEM: JSON file not found.  Either change `json_filename` to include the path or move this script to another directory.")

    

## CSV data introduction

**Read this before working on problems 2--3.**

The file `inttable.csv` you downloaded contains 200 columns of data.  It was generated using a script written by the instructor.  There is a header row, followed by 1000 rows of data.  Each datum is a positive integer, but of course the `csv` module will return these as strings.

Reminder: Since this CSV file has a header row, it is recommended to use a `csv.DictReader` object to read it.

## Problem 2 (CSV)


One of the columns in the CSV file is called `fabric`.  Write a script to determine the largest integer in that column.

Reminder: Since this CSV file has a header row, it is recommended to use a `csv.DictReader` object to read it.

## Problem 3 (CSV)

Write a script to find the largest integer that appears in *any* column of `inttable.csv`, and to report which column it was found in.

If there are ties (i.e. the largest number appears several times), report each one.

## JSON data introduction
Read this before working on problems 4--5.

The file `uicscorecard.json` you downloaded was obtained from a query submitted to the US Department of Education's College Scorecard API:

* https://collegescorecard.ed.gov/data/documentation/

The query requested information about the University of Illinois at Chicago.  The resulting JSON file contains certain information compiled by the Department of Education about UIC between 1998 and 2018.

While you won't need it for this assignment, you can find complete documentation of the fields in this JSON file at
* https://collegescorecard.ed.gov/assets/CollegeScorecardDataDictionary.xlsx

## Problem 4 (JSON)

Open the JSON file `uicscorecard.json` and load it using the `json` module.  Store the result in a variable.

* (A) The top-level object in this file is a dictionary.  What are its keys?

* (B) One of the keys is called `results`, and the associated value is a list.  How many elements are in this list?

## Problem 5 (JSON)

Let's assume that a variable `data` stores the dictionary read from `uicscorecard.json`.  (You should write code to accomplish that before proceeding.)

Then `data["results"][0]` gives a dictionary of data about UIC.  Some of the keys are years, which lead to year-specific data.

For example, the fraction of degrees awarded at UIC in 2017 that were in mathematics can be retrieved as:

`data["results"][0]["2017"]["academics"]["program_percentage"]["mathematics"]`

Here, `"2017"` is the year, given as a string.  The value

`data["results"][0]["2017"]["academics"]["program_percentage"]`

is a dictionary containing fractions of degrees awarded in that year in various subjects.

* (A) For which years does this dataset contain information about the fraction of degrees awarded in mathematics?  (A value of `None` indicates missing data.)
* (B) What other subjects are tracked, besides mathematics?  (Base your answer on the 2017 data.)
* (C) Write a program to print a table of years and the fraction of degrees awarded in mathematics, and in engineering, for each year in which at least one of those fractions is available.



## Extension activity

**You can work on this if you finish all the problems above during discussion.**

The file `uicscorecard.json` contains a lot of interesting data about UIC.  It also has a lot of fields with missing data, and it uses a very complex hierarchy of fields.

By exploring the keys of various dictionary elements in this file using the Python REPL, try to find interesting information and compile it into a more condensed form (e.g. a table showing how some quantity varies across years, or across academic disciplines).

Consult the data dictionary for help with the field meanings:
* https://collegescorecard.ed.gov/assets/CollegeScorecardDataDictionary.xlsx

(This is a Microsoft Excel file, but it can also be read by Google Sheets which you can access at [gdocs.uic.edu](http://gdocs.uic.edu).)