# Homework: UFO Two-Point-Oh!

## The Problem

For this assignment, we will enhance the End-To-End Example from this unit. The EETE loads 6 months of UFO reports and then allows the user to search the reports by State or Shape. 

As you may recall from that example, we created two functions `read_ufo_data()` which read the dataframes and concatenated them with `pd.concat()` we also wrote a function called `dedupe_series()` which take the input `pd.Series` and returns a list of unique values for creating a drop-down widget. 

Finally we created a interact UI where the user can select a State, UFO shape or both, then query the data, outputting a dataframe. 

### Your Additions to the EETE

We will add two additional search criteria to this program: 

1. Search by Day of the Week e.g. Monday, Tuesday, Wednesday...
2. Search by Color: Red, Green, White, ...

In both cases we will need to engineer these two `pd.Series`, let's call them `DayOfWeek` and `Color`. The approach to engineering them will be the same used in the `df.apply(lambda row:)` method.

The final program will allow a user to select a Day of the week, UFO Color, UFO Shape, State, or any combinations thereof, and then output the matching rows in a dataframe.

### Video of an example run:

<a href="//imgur.com/a/LC80EHE">UFO Sample run</a></blockquote><script async src="//s.imgur.com/min/embed.js" charset="utf-8"></script>

## Approach:

This assignment is broken up into parts. We will use the same approach we've used for the past few assignments:  problem simplification to solve this problem as "parts" and then take a bottom up approach, assembling the "parts" together.

- **You Code 2.1** Write function `get_day_of_week(date)` to get Day of week from a Date
- **You Code 2.2** Use `apply()` to generate the column using the function from 2.1
- **You Code 2.3** Write a function `extract_colors(summary)` to get the Colors from the summary.
- **You Code 2.4** Use `apply()` to generate the column using the function from 2.3
- **You Code 2.5** Assemble the program from its parts

Since we are taking a bottom up approach, **hold off on completing part 1, until you are on step 2.5**

## Part 1: Problem Analysis

You will complete a problem analysis for the entire program. **Since we are using the bottom-up approach, do not attempt until step 2.5**


### 1.1 Program Outputs

Describe your program outputs in the cell below. 


### 1.2 Program Inputs

List out the program inputs in the cell below.


### 1.3 The Plan (Algorithm)

Explain, as specifically as you can, without writing code, how the program works from input to output. Be detailed with your plan as you will need to turn it into code. 


## Part 2: Code Solution

You may write your code in several cells, but place the complete, final working copy of your code solution within this single cell below. Only the within this cell will be considered your solution. Any imports or user-defined functions should be copied into this cell. 

In [1]:
# PASTE THE CODE from "Complete Working Code" Section of the EETE Here and RUN THE CODE!
import pandas as pd
from IPython.display import display, HTML
from ipywidgets import widgets, interact_manual
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_colwidth', None)

def read_ufo_data():
    ufos = []
    for i in range(1,6):
        ufo = pd.read_csv(f'https://raw.githubusercontent.com/mafudge/datasets/master/ufo-sightings/ufo-sightings-2016-0{i}.csv')
        ufos.append(ufo)
        df = pd.concat(ufos, ignore_index=True)
    return df

def dedupe_series(series: pd.Series) -> list[str]:
    values = sorted(list(series.dropna().unique()))
    values.insert(0, "*ANY*")
    return values

df = read_ufo_data()
states = dedupe_series(df['State'])
shapes = dedupe_series(df["Shape"])

display(HTML("<h1>Search UFO Sightings<h1>"))
@interact_manual(state=states, shape=shapes)
def on_click(state, shape):
    search_df = df
    if state != '*ANY*':
        search_df = search_df[ search_df.State == state ]
    if shape != '*ANY*':
        search_df = search_df[ search_df.Shape == shape ]
    display(search_df)

interactive(children=(Dropdown(description='state', options=('*ANY*', 'AB', 'AK', 'AL', 'AR', 'AZ', 'BC', 'CA'…

### You Code 2.1: Write function `get_day_of_week(date)` to get Day of week from a Date

Assuming you copied and executed the EETE code into this notebook, you should have a variable called `df` which represents the UFO sightings. 

The `Date / Time` column indicates when the UFO was sighted. If you execute `df['Date / Time']` you can see the format of the columns, for example here are two such values:

```
'1/31/16 23:10'  
'2/17/16 18:05'  
'5/24/16 22:00'
```

We want to write a function called `get_day_of_week()` that takes one of those date/time strings as **input** and then returns the day of the week as **output**  For example:

```
#                           <input>        <output>
assert get_day_of_week("1/31/16 23:10") == "Sunday"
assert get_day_of_week("2/17/16 18:05") == "Wednesday"
assert get_day_of_week("5/24/16 22:00") == "Tuesday"
```

To accomplish this task you will need to learn about **date parsing** in Python. As you may recall **parsing** is the act of deriving meaning from text. In this case we are taking a text string like this as input `"1/31/16 23:10"` and parsing it into a python `datetime` type. When the string is read as a date, we can now ask it questions like when is the next month? Is this the last day of the month, and most importantly for us: What day of the week is this? 


The `get_day_of_week()` function is mostly written for you. All you need to do is figure out the format codes to make the parser, and formatters work. You will need to figure out TWO format strings. one to parse the date / time the other to display the day of the week from the parsed datetime.


**Parsing with strptime()**

`strptime(datestring, format)` parses the `datestring` using the specified `format`, and returns back a Python `datetime` type.

You'll need to identify the format of the date and time in the input, and convert it to a format string. Here are the codes:

https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

Here is a sandbox where you can play around with the codes

https://www.strfti.me/

So for example `'5/24/16'` as the `dateonly` part what is this format? Where is the month? Year? Day? What separates a year from a month? etc. Answer these questions and you can build a formatter for the parser!


**Formatting with strftime()** 

`datetime.strftime(format)` converts a Python `datetime` type back into a formatted date. You need to supply the proper format code to print the day of the week. 


In [None]:
# SOLUTION CELL 2.1
def get_day_of_week(datestr: str) -> str:
    from datetime import datetime
    dateonly = datestr.split()[0].strip()
    parsed_datetime = datetime.strptime(dateonly, "?TODO-Replace-Me-With-Date-Format?")
    day_of_week = parsed_datetime.strftime("?TODO-Repalce-Me-With-Day-of-Week-Format?")
    return day_of_week


# TESTS do not change the code below here... make your function pass the tests
def test_get_day_of_week(datestr, expect):
    actual = get_day_of_week(datestr)
    print(f"DATESTR='{datestr}' EXPECT='{expect}' ACTUAL='{actual}'")
    assert expect == actual


test_get_day_of_week(datestr="1/31/16 23:10", expect="Sunday")
test_get_day_of_week(datestr="2/8/16 18:05", expect="Monday")
test_get_day_of_week(datestr="5/24/16", expect="Tuesday")

### You Code 2.2: Use `apply()` to generate the column using the function from 2.1

With a working function, now use `df.apply()` to create a new Series in the DataFrame called `DayOfWeek`.

**This transformation should require one line of Python code to complete properly.** Please consult the readings, large group, lab, and small group from this week for many examples of how this is done. 

After you create the new column, call the `dedupe_series()` function to create a list of unique `days` from the Series.

`print()` the `days` so the code checker can verify you completed this step properly. 


In [None]:
# SOLUTION CELL 2.2
import pandas as pd

# COPY read_ufo_data() function here


# COPY dedupe_series() function here


# COPY get_day_of_week() function here


# main code
# CALL read_ufo_data() here to load the dataframe

# Create `DayOfWeek` Series in the DataFrame

# Dedupe `DayOfWeek` Series  into a list


# FOR CHECKER: print the list


### You Code 2.3 - Write a function `extract_colors(summary)` to get the all the colors found in some the summary.

The `Summary` Series contains a text description of what the person who sighted the UFO saw. The person will often indicate colors such as `"I saw a red and white flash in the sky"`. We want to take this summary string and from it, extract out the colors, for example: `'red white'`. 

Ideally we would use some kind of AI (specifically "Named Entity Recognition https://en.wikipedia.org/wiki/Named-entity_recognition, a form of "Natural Language Processing") for this type of task. We will learn about, and use NER later. For now we will extract using a curated list of colors and our own algorithm.

```
curated_colors = ['white', 'orange', 'yellow', 'red', 'blue', 'green']
```

Our `extract_colors(text: str)-> str` function takes some `text` as input and then outputs a `list` of color names found in the text. Here is the **algorithm**, once again the using the classic search pattern:

    1. start with an empty string for colors
    2. lowercase the text (to improve matching)
    3. for each curated color
    4.     if you find the curated color in the lowercased text
    5.         concatenate the color + a space to the colors string
    6. strip the whitespace from the colors string before returning it


Write this implement this function and get it to pass the tests provided.

In [None]:
# SOLUTION CELL 2.3
def extract_colors(text: str)-> list:
    curated_colors = ['white', 'orange', 'yellow', 'red', 'blue', 'green']
    # TODO write the rest

#TESTS do not change the code below here... make your function pass the tests
def test_extract_colors(text, expect):
    actual = extract_colors(text)
    print(f"TEXT='{text}' EXPECT='{expect}' ACTUAL='{actual}'")
    assert expect == actual


test_extract_colors(text="Santa wears a Red and White hat", expect='white red')
test_extract_colors(text="the mexican flag is GREEN, RED and WHITE", expect='white red green')
test_extract_colors(text="I am so blue", expect='blue')
test_extract_colors(text="Cheese for me please!", expect='')


### You Code 2.4 - Use `apply()` to generate the column using the function from 2.3
 
With a working function, now use `df.apply()` to create a new Series in the DataFrame called `Colors`.

**Once more this transformation should require one line of Python code to complete properly.** 

No need to dedupe as we have our `curated_colors`

`print()` the `Colors` Series for just one single reporting at index 15 so the code checker can verify you completed this step properly. The output should be `'orange green'`

In [None]:
# SOLUTION CELL 2.4
import pandas as pd


# COPY read_ufo_data() function here


# COPY dedupe_series() function here


# COPY extract_colors() function here


# MAIN CODE 
# CALL read_ufo_data() here to load the dataframe

# Create `Colors` Series in the DataFrame

# FOR CHECKER: print the 15th item from the Colors series. should be 'orange green'


**TIP** Help yourself for later. Write a boolean expression to find all rows matching a specific color from the `Colors` series. For example, use the `'orange'` color. One approach is to use: `str.find()` on the series like we did in the Small Group activity. You'll need it for the color search in 2.5

### You Code 2.5 Assemble the final program as an interact

With all the components built, its time to build the complete program. With an exception of a couple lines of code, this will be an assemblage of all of the work you've done earlier.

- Complete the Problem Analysis Section above. 
  - What are the 4 inputs (note: all are drop down widgets)?
  - What is the 1 output?
  
**Help with the Algorithm:**

- When you use interacts to generate your widgets for input, ususally you need to setup those widgets with data before you generate them. This divides our program into stuff that happens before the widgets and that which happens while interacting with them.
- Before you create the widgets, you'll need to:
    - load in the dataset into a dataframe
    - generate your columns (`DayOfWeek` and `Colors`)
    - create the neccessary lists for 3 widgets using `dedupe_series()`
    - NOTE: `curated_colors` the 4th widget is mostly all set, but you should `insert()` the `*ANY*` into the list in case the user doesn't want to select a color.
- Under your `interact_manual` you should follow the search pattern that is used in the EETE code. You are just adding two more `if` cases for the day of the week and color filters.


You must write the algorithm in the Problem Analysis section, or risk not getting full credit for the assignment. 

In [None]:
# SOLUTION CELL 2.5
import pandas as pd
import numpy as np
import warnings
from IPython.display import display, HTML
from ipywidgets import interact_manual
warnings.filterwarnings('ignore')
pd.set_option('display.max_colwidth', None)


# COPY read_ufo_data() function here


# COPY dedupe_series() function here


# COPY get_day_of_week() function here


# COPY extract_colors() function here


## BEFORE INTERACT INPUTS


display(HTML("<h1>Search UFO Sightings<h1>"))
@interact_manual(TODO-4-widgets-here)
def onclick(TODO-4-arguments-here):
    # build search DF then display it
    



## Part 3: Metacognition

These questions are designed to prompt you to reflect on your learning. Reflection is part of the assignment grade so please take time to answer the questions thoughtfully.

#### 3.1 List at least 3 things you learned this week and/or throughout the process of completing this assignment?

#### 3.2 What were the challenges or roadblocks (if any) you encountered on the way to completing it?

#### 3.3 Were you prepared for this assignment? What can you do to be better prepared?

#### 3.4 Did someone (or something such as AI) help you? Did You help someone? Provide details. 

#### 3.5  Now that you have completed the assignment rate your comfort level with this week’s material. This should be an honest assessment of your ability: 

**1** ==> I don't understand this at all yet and need extra help. If you choose this please try to articulate that which you do not understand to the best of your ability in the questions and comments section below.  
**2** ==> I can do this with help or guidance from other people or resources. If you choose this level, please indicate HOW this person helped you in the questions and comments section below.   
**3** ==> I can do this on my own without any help.   
**4** ==> I can do this on my own and can explain/teach how to do it to others.

`ENTER A NUMBER 1-4 IN THE CELL BELOW`

## Part 4: Turning it in

FIRST AND FOREMOST: **Save Your work!** Yes, it auto-saves, but you should get in the habit of saving before submitting. From the menu, choose File --> Save Notebook. Or you can use the shortcut keys `CTRL+S`

### Homework Check

Check your homework before submitting. Look for errors and incomplete parts which might cost you a better grade.

In [None]:
from casstools.notebook_tools import NotebookFile
NotebookFile().check_homework()

### Homework Submission

Run this code and follow the instructions to turn in your homework.

In [None]:
from casstools.assignment import Assignment
Assignment().submit()