# Assignment Guidelines

- Please complete each exercise in the (one) code cell that is provided.  
- Delete all extra cells you have inserted for development and debugging purposes before turning in this assignment.
- Delete or comment out all print statements used for debugging.  Print statements that display the answers have already been coded into this notebook.
- Execute your code so that the solutions are showing.  Then save your file before you turn in it.
- All exercises are to be completed as directed with the Python methods indicated.  Importing <code>numpy</code>,  <code>pandas</code>, or other packages is not permitted: the goal of this assignment is to develop competency in base Python methods.

# Sources of Data for this Assignment

## Weather Data

This data was downloaded from the National Climatic Date Center (NCDC), which is part of the National Ocenaic and Atmosphere Administration (NOOA).  Getting similar data requires going to this web page:

[NOOA/NCDC](https://www.ncdc.noaa.gov/cdo-web/datatools)

and clicking on <code>Select a Location</code>

Subsequently, I choose <code>Daily Summaries>Zip Code>23185</code>, for example to get data for Williamsburg.  More navigation and selection is required which is not documented here.

## US Census Data

This data comes from:

[USCensus](api.census.gov)

In particular, this code file was used to download and generate this data set: <code>files/dev_data.py</code>

## Bluebikes Rentals of Boston

This data set comes from bike rental in Boston, MA, whose website is:

[https://www.bluebikes.com/system-data](https://www.bluebikes.com/system-data)

This page describes the data fields and the data use license information can be found at:

[https://www.bluebikes.com/data-license-agreement](https://www.bluebikes.com/data-license-agreement)

Consistent with that agreement, monthly data files have been downloaded from

[https://s3.amazonaws.com/hubway-data/index.html](https://s3.amazonaws.com/hubway-data/index.html)

and combined to create these reasonably large data files at
- https://jrbrad.people.wm.edu/data/ctba/bluebikes.csv
- https://jrbrad.people.wm.edu/data/ctba/bluebikes_clean.csv


## Mileage Between Cities

This data set comes from https://www.mapcrow.info/united_states.html.

# Exercise 1: Fibonacci Series (<code>for</code> Loops)

The Fibonacci Series is a sequence of integers, (1, 1, 2, 3, 5, 8, ...), where, after the first two elements, each element is the sum of the prior two elements.  Create a list called <code>fibSeries1</code> that contains the first 100 terms in this series.
    
__Reflect: why does a <code>for</code> make sense for this problem?__

In [None]:
''' Put your code here '''''

''' Please leave the print statements below '''''
print(len(fibSeries1))
print(fibSeries1)

# Exercise 2: Fibonacci Series (<code>while</code> Loops)

Create a list called <code>fibSeries2</code> that contains all elements in this series that are less than 1000.
    
__Reflect: why does a <code>while</code> make sense for this problem?__

In [None]:
''' Put your code here '''''

''' Please leave the print statements below '''''
print(len(fibSeries2))
print(fibSeries2)

# Exercise 3: Filter Fibonacci

Use list comprehension to filter out the elements in <code>fibseries1</code> from a problems above and assign to a variable named <code>fibSeries3</code> all elements in <code>fibSeries 1</code> that are a multiple of 3.

In [None]:
''' Put your code here '''''

''' Please leave the print statements below '''''
print(len(fibSeries3))
print(fibSeries3)

# Exercise 4: Filter Fibonacci

Use list comprehension to process the elements in <code>fibseries1</code> from a problem above and assign to a variable named <code>fibseries4</code> either the element value from <code>fibseries1</code> if it is a multiple of 3 or, otherwise, the string <code>'skip'</code>.

In [None]:
''' Put your code here '''''

''' Please leave the print statements below '''''
print(len(fibSeries4))
print(fibSeries4)

# Exercise 5: Data Input

Input the data file <code>files\miles.csv</code> into a list of lists named <code>data</code>.  You may use <code>for</code> loops or list comprehension.

In [None]:
''' Put your code here '''''

''' Please leave the print statements below '''''
print(len(data))
print(data)

# Exercise 6: U.S. Census Data (List Comprehension)

The <code>files</code> folder contains census population data.  In particular, this file contains the data:

<code>DECENNIALPL2020.P1_data_with_overlays_2021-11-05T143124.zip</code>

which you will need to extract in order to create this file within the <code>files</code> folder:

<code>DECENNIALPL2020.P1_data_with_overlays_2021-11-05T143124.csv</code>

This next file contains information about the column/field headings, although it is a bit cryptic:

<code>DECENNIALPL2020.P1_metadata_2021-11-05T143124.csv</code>

The data in the first file contains population in "census tracts".  A tract is an area of land that is part of a county, which is part of a state, which is, obviously, part of the United States.  So, this data drills down into small pieces of land and documents the population of each.  There are many population fields in this data, including those based on race.  You are interested in the total population of the tracts.

Your job is to create a list of the total population in the tracts by _list comprehension_.  The total tract population has a column name of <code>P1_001N</code> or, alternately <code>!!Total:</code>.  (There are essentially two lines with alternate field names.)  You should not include the data from the column/field headings in the list.

Warning: this data file is formatted in such a way that makes it a bit tricky, but you will figure it out!

In [None]:
''' Put your code here '''

''' Input the data '''
    
''' Create a list with list comprehension '''
pop_list = 

''' Please do not revise the statements below '''
print(len(pop_list))
print(pop_list)

# Exercise 7: Frequency Histogram

Assume that you want to analyze the data from the exercise above by plotting a frequency histogram of it.  Create a frequency histogram dictionary that could be used to make such a graph.  The keys should be the unique tract population values from the list entitled <code>pop_list</code> and the values should be the number of times a particular population value occurs.

Do this with a <code>for</code> loop.  You may refer to the variable <code>pop_list</code> from the code cell above without needing to recreate it: variables defined in one Jupyter cell are available to  code in other cells if the former cells have been executed.  Please use the name <code>fh_pop</code> for the dictionary.

In [None]:
fh_pop = {}
''' Put your code here '''


''' Please do not revise the statements below '''
print(len(fh_pop))
print(fh_pop)

# Exercise 8: Mileage Matrix Input

These files contain data on the mileage between cities in the United States:
- <code>files\miles.csv</code>
- <code>files\cities.csv</code>

The former file contains the mileage between each pair of cities and the latter file lists the cities, which can be thought of as the labels for both rows and columns.

Create a dictionary named <code>dist</code> whose keys are the city names and the values for each are lists of the mileage from that city to the other cities in the order they appear in each row of <code>miles.csv</code>.

In [None]:
''' Put your code here '''
dist = {}

''' Please do not revise the statements below '''
print(len(dist))
print(dist)

# Exercise 9: Set Comprehension

The following file contains data from a restaurant about the number of guests in parties that have visited the restaurant:

<code>files/guests.json</code>

Input the data using the <code>json</code> module and create a set, named <code>guest_set</code>, that contains the unique values for the number of guests in the parties reflected in the data.

In [None]:
import json

''' Input the data '''
with open('files/guests.json') as f:
    guests = json.load(f)
    
''' Put your code here '''
guest_set = 

''' Please do not revise the print statements below '''
print(len(guest_set))
print(str(type(guest_set)))
print(list(guest_set))

# Exercise 10: Set Comprehension

Input the data in the Boston Bluebikes database as described above.  Code has been provided in the cell below to access this fairly large data set from the Internet at: <code>https://jrbrad.people.wm.edu/data/ctba/bluebikes_clean.csv</code>   

Using that data, create a set named <code>bike_stations</code> that contains all the unique station names found in the <code>"start station name"</code> data field.

This data set is almost 1GB and, in my experience, can take a few minutes to access.  As is good practice in code development, you should use a smaller version of this file during code development so that you do not have to wait a long time to discover the next bug that needs fixing.  Toward that end, a smaller version of this file has also been provided, as well as code to access it, which is currently commented out.  Remove the comment syntax to use it.

Please note that regardless of which data you use, the variable <code>data</code> is a generator.

In [None]:
import requests

''' Get the large data set from the Internet with the requests module '''
''' This code creates a generator that presents the data from the file line by line '''

url = 'https://jrbrad.people.wm.edu/data/ctba/bluebikes_clean.csv'
response = requests.get(url)
data = (line for line in response.text.split('\r\n'))
    
#print(f'Acquisition time: {time.time() - start} seconds')

''' Input the small version of the data file '''
#with open('files/bluebikes_small.csv', 'r') as f:
#    data = f.readlines()
#data = iter(data)

''' Put your code here '''


''' Please do not revise the print statements and other statements below '''
bike_stations = list(bike_stations)
bike_stations.sort()
print(len(bike_stations))
print(str(type(bike_stations)))
print(bike_stations)

# Exercise 11: More Bluebikes Analysis (Set Comprehension)

Managers of the Bluebikes operation would be concerned with control of their assets, in particular, their bikes.  If a bicycle with a particular ID was not used for a sufficiently long period, then managers should consider the possibility that the bike is lost, stolen, or located someplace where potential users cannot find it.  To get a gauge on how many of its bicycles have been in circulation, an analyst could compute a set of bike IDs using set comprehension to determine which bikes were actively being used.  Subsequent comparison with a list of bike IDs that should be in the fleet would indicate which bikes were missing.

Please note that the variable <code>data</code> is an iterator and so you can use the techniques we discussed in class to iterate through the stream of data it makes available.

Create such a set of unique bike IDs with set comprehension, named <code>bike_id</code>.  Code has been provided to print out the number of bikes in that set and the data itself.  The bike IDs should be in an appropriate numerical data type.

In [None]:
import requests

''' Get the large data set from the Internet with the requests module. This initial code block '''
'''   - Creates a generator that presents the data from the file line by line '''

url = 'https://jrbrad.people.wm.edu/data/ctba/bluebikes_clean.csv'
response = requests.get(url)
data = (line for line in response.text.split('\r\n'))
    
''' Input the small version of the data file '''
#with open('files/bluebikes_small.csv', 'r') as f:
#    data = f.readlines()
#data = iter(data)

''' Put your code here '''


''' Please do not revise the print statements below '''
print(len(bike_id))
print(str(type(bike_id)))
print(list(bike_id))

## Exercise 12: Still More Bluebikes (Dictionary Comprehension)

Use the data in the Boston Bluebikes database again to create a dictionary by dictionary comprehension. The dictionary keys should come from the station names found in the <code>"end station name"</code> data field and the values should be a tuple containing the each station's latitude and longitude, in that order.  The latitude and longitude in the value tuple should be represented by an appropriate numerical data type.  The resulting dictionary should be named <code>station_gis</code>.

Recall that I have provided a smaller subset of this data, via statements that are currently commented out, that you can use to more quickly develop your code.

In [None]:
import requests

''' Get the large data set from the Internet with the requests module '''
''' This code creates a generator that presents the data from the file line by line '''

url = 'https://jrbrad.people.wm.edu/data/ctba/bluebikes_clean.csv'
response = requests.get(url)
data = (line for line in response.text.split('\r\n'))
    
#print(f'Acquisition time: {time.time() - start} seconds')

''' Input the small version of the data file '''
#with open('files/bluebikes_small.csv', 'r') as f:
#    data = f.readlines()
#data = iter(data)

''' Put your code here '''

        
''' Please do not revise the print statements below '''
print(len(station_gis))    
print(station_gis) 

# Exercise 13: Practice with the <code>enumerate</code>  Function

This exercise repeats the previous exercise where you documented latitude and longitude of each Bluebike station using dictionary comprehension, except the data set is different.  The goal of this exercise is also a bit different because the data file referenced in the code below has not been sufficiently cleaned.  So, your goal here is to use the <code>enumerate()</code> function to do the following:

- Determine the index of the line where a problem with the data first occurs
- Print out the offending line, and perhaps other previous lines, to determine what the issue is

Note that the <code>bluebikes.csv</code> is significantly larger than the data file you worked with previously.  As was done previously, a smaller, clean file is available here for you to input.

Once you find the first line that causes an error, determine what is a good description of that error and create a string with a succinct string describing the problem in the variable <code>err_desc</code>.  If you wish, although it is not required, you may look for a second issue with the data by putting a conditional statement in your code that ignores the lines of data with the problem you first identified.

In [None]:
import requests

''' Get the large data set from the Internet with the requests module. This initial code block 
    creates a generator that can be used in a for loop.  That is, the variable data is an iterable. '''

url = 'https://jrbrad.people.wm.edu/data/ctba/bluebikes.csv'
response = requests.get(url)
data = (line for line in response.text.split('\r\n'))

#print(f'Acquisition time: {time.time() - start} seconds')

''' Input the small version of the data file for testing and development purposes '''
#with open('files/bluebikes_small_dirty.csv', 'r') as f:
#    data = f.readlines()
#data = iter(data)

''' Put your code here '''

        
err_desc = ''
''' Finish code above this line: Please do not revise the print statements below '''
print(err_desc)
print(len(station_gis2)) 
print(station_gis2) 

# Exercise 14: Weather Bragging Rights

Professor Bradley has friends in Bath, Maine and he likes to remind them how warm the weather in Williamsburg is, particularly when Maine still has snow on the ground and low temperatures in the early part of spring when it is often warm here.  Professor Bradley has obtained two data sets with weather data from NOOA/NCDC:
- files/BathME.csv
- files/Wburg.csv

He is particular interested in comparing the maximum temperatures on a daily basis, which have a column heading of <code>TMAX</code> in both data files from NOAA.  Note that each data set starts on 1/1/2020 and both have daily weather, but they may not have the same number of day's data.

Do these tasks:
- Input the two data sets into Python
- Create lists or tuples with data from just the <code>TMAX</code> field from each data set
- Use the <code>zip()</code> function to align the maximum temperatures in the two locations within a new data structure/variable
- Use that data variable to print out these results, in this order:
  - How many days are being compared?  Use the variable <code>num_comp</code> for this value.
  - How many of those days does Williamsburg have a higher temperature than Bath, ME? Use the variable <code>wburg_high</code> for this value.
  - Just print the numerical answers for the two items above: no text explaining what the data are is required.
  
Please note the data format presents a couple of (fun) minor hurdles in completing this exercise successfully.

In [None]:
''' Put your code here '''

num_comp = 
wburg_high = 

''' Please do not revise the statements below '''
print(num_comp)
print(wburg_high)

# Exercise 15: Using the <code>*</code> Operator 

Append the two lists <code>list1</code> and <code>list2</code> using the <code>*</code> operator into a list named <code>list_tot</code>.  You may not use the list <code>.append()</code> method, the <code>+</code> operator, or any other function or operator besides the <code>*</code> operator.

In [None]:
list1 = [7, 4, 9, 10, 6]
list2 = [16, 16, 15, 18, 11]
list_tot = 

print(len(list_tot))
print(list_tot)

# Exercise 16: <code>zip</code> Again 

Interpret the list <code>data</code> as a collection of <code>x</code> coordinates that are to be plotted in the form of a line.  Transform <code>data</code> into two separate lists, named <code>x</code> and <code>y</code> for the <code>x</code> series and the <code>y</code> data series, respectively.

You must complete this task by finishing the statement on the secondline below by replacing the text, "finish_statement_here", with appropriate code.  You may use only that one line, without adding other lines, to make this code operational.

In [None]:
data = [[0, 20], [1, 8], [2, 6], [3, 17], [4, 10], [5, 6], [6, 20], [7, 13], [8, 10], [9, 9]]
x, y = finish_statement_here
print((len(x), len(y)), '\n', x, '\n', y)

# Exercise 17: Complete the Code 

Complete the function <code>fib_again</code>, which is intended to generate the first twelve terms of the Fibonacci Series, by placing a short piece of code in the fourth line of the function where the text "insert_here" appears.  That statement recretes the <code>fib</code> list in each iteration of the <code>for</code> loop.  Ask yourself what elements need to appear in the <code>fib</code> list before the value of the next term, whcih has already been coded for you.

In [None]:
def fib_again():
    fib = [1,1]
    for _ in range(10):
        fib = [insert_here, fib[-1]+fib[-2]]
    return fib

print(fib_again())