# Project Milestone Template

### Step 1a: Planning 
#### Identify the information in the file your program will read

Describe (all) the information that is available. Be sure to note any surprising or unusual features. (For example, some information sources have missing data, which may be blank or flagged using values like -99, NaN, or something else.)

<font color="blue">
    
The cycle of the American election, which is 2016 for all polls as this was presidential polls conducted from 2016. There is a forecast date, which is November 1st 2016 for all polls as they are forecasted after all the polls close, which is close to that day. There is a state that the poll was conducted which is the location indicator in this context. The start and end dates indicate when the poll started conducting information, and when it stopped. The pollster is the source and the media/organization that conducted the poll. The branch represents the branch of government in which this election is for, which is by default President for all polls. The matchup indicates who's polling numbers will be assessed in the poll, and for all of them it is Trump, Clinton, and Johnson. The type of poll is Polls Plus for all of them which is a type that combines the economic index with the polls. 

</font>

### Step 1b: Planning 
#### Brainstorm ideas for what your program will produce
#### Select the idea you will build on for subsequent steps

You must brainstorm at least three ideas for graphs or charts that your program could produce and choose the one that you'd like to work on. You can choose between a line chart, histogram, bar chart, scatterplot, or pie chart.

If you would like to change your project idea from what was described in the proposal, you will need to get permission from your project TA. This is intended to help ensure that your new project idea will meet the requirements of the project. Please see the project proposal for things to be aware of when communicating with your project TA.

<font color="blue">
    
1. A bar chart with each candidate's names on the x-axis and their average polling number on the y-axis
2. A pie chart where the proportion of the total score for each candidate are represented by each sector. For example, if there are 700 polls, the total score would be 700*100 (since polls are measured by percentage). The candidates score out of 100 for all 700 polls are added up and divided by the total score which how large their sector will be on the pie chart.
3. A line chart of the candidates' polling scores overtime. Each line will represent the trend in poll numbers throughout the course of the campaign.
    
I will build on idea 1.

</font>

### Step 1c: Planning 
#### Write or draw examples of what your program will produce

You must include a **hand-drawn image** that shows what your chart or plot will look like. You can insert an image using Edit -> Insert Image.

Insert your image into this cell.

### Step 2a: Building
#### Document which information you will represent in your data definitions

Before you design data definitions in the code cell below, you must explicitly document here which information in the file you chose to represent and why that information is crucial to the chart or graph that you'll produce when you complete step 2c.

<font color="blue">
    
I will be using information from the matchup, state, adj_pollclinton, adj_polltrump, and adj_polljohnson tabs. The matchup is the same for all polls which include the names of all candidates. The names are crucial as they will each represent a single bar on the bar chart. Each poll has a state of either "US" or any individual state in the United States. If the state indicates "US", that means it is a nationwide poll. If the state is not "US", then it is an individual state so it is a statewide poll. The overall polling scores of nationwide and statewide polls are weighted evenly into the final average polling score so knowing if it is a statewide or nationwide poll is absolutely necessary. Adj_pollclinton, adj_polltrump, and adj_polljohnson are the adjusted polling scores for each candidate. All the polls are of type polls plus, which combines the polls with an economic index and adjusts accordingly to ensure more accuracy. Therefore, the polling numbers used in the calculations will be the adjusted ones as it will include more sophisticated numbers.


</font>

#### Design the data definitions

In [1]:
from cs103 import *
import csv
from typing import NamedTuple, List
from enum import Enum

In [16]:
##################
# Data Definitions

PollBoundary = Enum('PollBoundary',["statewide", "nationwide"])

#interp. a poll boundary that can be statewide (Statewide) or nationwide (Nationwide)
#examples are redundant for enumerations

@typecheck
def fn_for_poll_boundary(pb: PollBoundary)-> ...:
    #template from enumeration (2 fields)
    if pb==PollBoundary.statewide:
        return...
    elif pb==PollBoundary.nationwide:
        return...
    
PresidentialPoll = NamedTuple('PresidentialPoll',[('boundary',PollBoundary),
                                                  ('clinton_polling', float), #in range [0,100]
                                                  ('trump_polling',float), #in range [0,100]
                                                 ('johnson_polling',float)]) #in range [0,100]


#interp. a presidential poll from 2016 that includes the poll boundary (PollBoundary) which can be statewide or nationwide, 
#and the polling score for each candidate (clinton_polling,trump_polling,johnson_poll)

PP_0 = PresidentialPoll(PollBoundary.statewide, 44.7,44.3,2.1)
PP_1 = PresidentialPoll(PollBoundary.nationwide, 34.5,48.9,1.8)
PP_2 = PresidentialPoll(PollBoundary.statewide, 0, 89.9,3.5)
PP_3 = PresidentialPoll(PollBoundary.nationwide,76.9,0,13.4)
PP_4 = PresidentialPoll(PollBoundary.statewide,43.8,50.1,0)

@typecheck
def fn_for_presidential_poll(pp: PresidentialPoll) -> ...:
    #template from Compound (4 fields) and reference rule 
    return...(fn_for_poll_boundary(pp.boundary),clinton_polling,trump_polling,johnson_polling)

PresidentialPollList = List[PresidentialPoll]
# interp. a list of presidential polls

LOPP0 = []
LOPP1 = [PP_0,PP_1]
LOPP2 = [PP_0,PP_2,PP_4]
LOPP3 = [PP_1,PP_3]


@typecheck
def fn_for_lopp(lopp: List[PresidentialPoll]) -> ...:
    #template from arbitrary-sized and reference rule
    #description of accumulator
    acc = ... #type:
    for pp in lopp:
        acc = ...(fn_for_presidential_poll(pp), acc)
    return...(acc)


### Step 2b: Building
#### Design a function to read the information and store it as data in your program

Complete this step in the code cell below. Your `read` function should remove any row with invalid or missing data but otherwise keep all the data. I.e., you should **not** design the `read` function such that it only returns the data you need for step 2c.

You can choose to continue to build on this file when completing the final submission for the project (as opposed to copying your code over to the `project_final_submission_template.ipynb` file). However, if this is the approach you are taking, please go to the `project_final_submission_template.ipynb` file and read through the "Step 2b and 2c: Building" section. This section contains crucial information about common issues students encounter. We expect that you will be familiar with this information.

In [62]:
###########
# Functions

@typecheck
def parse_poll_boundary(s: str) -> PollBoundary:
    """
    returns string s as a PollBoundary
    Assume s is the name of a US state or "U.S."
    """
    #return PollBoundary.nationwide #stub
    #template from atomic non-distinct
    #return...(s)
    if s=="U.S.":
        return PollBoundary.nationwide
    else:
        return PollBoundary.statewide

start_testing()
expect(parse_poll_boundary("U.S."), PollBoundary.nationwide)
expect(parse_poll_boundary("Pennsylvania"), PollBoundary.statewide)
summary()

@typecheck
def read(filename: str) -> List[PresidentialPoll]:
    """    
    reads information from the specified file and returns a list of presidential polls
    """
    #return []  #stub
    # Template from HtDAP
    # lopp contains the result so far
    lopp = [] # type: List[PresidentialPoll]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        next(reader) # skip header line

        for row in reader:
            # you may not need to store all the rows, and you may need
            # to convert some of the strings to other types
            if is_reliable(row):
                pp = PresidentialPoll(parse_poll_boundary(row[5]), float(row[17]), float(row[18]), float(row[19]))
                lopp.append(pp)
    return lopp

start_testing()
expect(read("Test 1"),[])
expect(read("Test 2"),[PresidentialPoll(PollBoundary.nationwide,42.6414,40.86509,5.675099)])
expect(read("Test 3"),[PresidentialPoll(PollBoundary.nationwide,44.6508,42.26663,6.114222), PresidentialPoll(PollBoundary.nationwide,42.21983,41.6954,4.220173)])
    
summary()


@typecheck
def is_reliable(row: List[str]) -> bool:
    """
    returns True if there is a polling number present for each candidate (row[13],row[14],row[15]),
    a state name or the country name present (row[5]),
    polling numbers are above or equal to 0, and below or equal to 100,
    AND polling numbers cannot sum to a value higher than 100
    
    Specifically, row[13],row[14],row[15] must exist (non-empty), represent polling numbers, 
    and be the string form of an float
    
    An empty string for row[13],row[14], or row[15] is not reliable as it does not represent a number
    and to assume it means 0 is too ambiguous as the sample sizes of each poll makes the possibility
    of a candidate to poll at purely 0% is too low.
    An empty string for row[5] is not reliable as it does not represent a state or the country
    so it cannot be decided whether or not the poll is statewide or nationwide 
    A polling number below 0 is not reliable as that is not a possible score and a polling number above 100
    is also not reliable as a candidate cannot have more than 100% of the total distribution of polling scores
    
    """
    #return False #stub
    #template from atomic with indexing
    if (row[17]!="" and row[18]!="" and row[19]!=""):
        return (float(row[17])>=0  and float(row[17])<=100) and (float(row[18])>=0  and float(row[18])<=100) and(float(row[19])>=0  and float(row[19])<=100) and ((float(row[17]) + float(row[18])+float(row[19]))<=100)and (row[5]!="") 
    else:
        return False
start_testing()
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","Florida","10/25/2016","10/31/2016","","","","","","","","","","34.6","30.1","3.7","","","","","","",""]), True)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","U.S.","10/25/2016","10/31/2016","","","","","","","","","","14.7","60.8","10.6","","","","","","",""]), True)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","","10/25/2016","10/31/2016","14.7","60.8","10.6","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","U.S.","10/25/2016","10/31/2016","","40.3","20.9","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","Ohio","10/25/2016","10/31/2016","33.1","","30.5","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","California","10/25/2016","10/31/2016","45.5","50.0","","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","","10/25/2016","10/31/2016","47.0","20.0","","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","","10/25/2016","10/31/2016","70.3","","3.9","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","Indiana","10/25/2016","10/31/2016","","34.5","","","","","","","",""]), False)
# expect(is_reliable([]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","","10/25/2016","10/31/2016","","","","","","","","","","","","","","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","Arkansas","10/25/2016","10/31/2016","","","","","","","","","","","","","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","","10/25/2016","10/31/2016","","","","","","","","","","33.6","","","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","","10/25/2016","10/31/2016","","","","","","","","","","","76.7","","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","","10/25/2016","10/31/2016","","","","","","","","","","","","23.4","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","Texas","10/25/2016","10/31/2016","","","","","","","","","","30.5","50.3","40.5","","","","","","",""]), False)
expect(is_reliable(["2016","President","polls-plus","Clinton vs. Trump vs. Johnson","11/1/16","Texas","10/25/2016","10/31/2016","","","","","","","","","","1.5","4.3","7.5","","","","","","",""]), True)
summary()
# Begin testing
start_testing()

# Examples and tests for read
expect(..., ...)

# show testing summary
summary()

[92m2 of 2 tests passed[0m
[92m3 of 3 tests passed[0m
[92m16 of 16 tests passed[0m
[92m1 of 1 tests passed[0m


In [63]:
# Be sure to select ALL THE FILES YOU NEED (including csv's) 
# when you submit. Also, UNLIKE USUAL, YOU CAN EDIT THIS CELL!
# That's in case you want to switch the ASSIGNMENT code for the final
# submission. Run this cell to start the submission process.
from cs103 import submit

COURSE = 123409
ASSIGNMENT = 1615245
#ASSIGNMENT = 1615244 # UNCOMMENT for final submission and COMMENT line above

submit(COURSE, ASSIGNMENT)

# If your submission fails, SUBMIT by downloading your files and uploading them to 
# Canvas. You can learn how on the page "How to submit your Jupyter notebook" on 
# our Canvas site.

Valid(value=True, description='Token')

SelectMultiple(description='Files', index=(0,), layout=Layout(height='100%', width='50%'), options=('project_m…

Button(description='submit', icon='check', style=ButtonStyle(), tooltip='submit')

# Please double check your submission on Canvas to ensure that the right files (Jupyter file + CSVs) have been submitted and that the files do not contain unexpected errors.

<font color="red">**You should always check your submission on Canvas. It is your responsibility to ensure that the correct file has been submitted for grading.**</font> Regrade or accomodation requests using reasoning such as "I didn't realize I submitted the wrong file"/"I didn't realize the submission didn't work"/"I didn't realize I didn't save before submitting so some of my work is missing" will not be considered.