## Lecture 25: Course Wrap Up


### Please note: This lecture will be recorded and made available for viewing online. If you do not wish to be recorded, please adjust your camera settings accordingly. 

# Reminders/Announcements:
- Today will be the last official lecture for Math 157, Winter 2021. 
- *Please still take the time to fill out CAPES/TA evals* if you have not done so!
- Final Projects will be *collected at 8pm Pacific on Wednesday, March 17th*. Presentations need to be done *before that*.


## Final Project Summary (Ask your questions during this if you have them!!!)

Let's take a brief look at my mock 157 student to make sure we are all on the same page regarding final projects.

## ***** Participation Check **************

This participation check is *for you* to determine how close you are to finishing your 157 project. It is not important for you to have finished all of these currently, but hopefully it can be used as a good organizational tool for you. For each of these, answer Yes or No on the corresponding line. This is not an exhaustive checklist, but it's a good start:

- My final project directory has a presentation file and two feedback files
    - Answer:
- I have reached out to my group and scheduled a time to present:
    - Answer:
- I understand and have defined the *main theorem/definition/algorithm/tool/etc* which is relevant to my project topic:
    - Answer:
- I have *at least one* relevant code example in my presentation which highlights the theory:
    - Answer:
- My presentation seems *close to* the length of a normal 157 lecture (it can be a bit shorter, but remember: the goal is 40-50 minutes)
    - Answer:
- I included a *visualization* in my presentation, like a graph, chart, table, picture, etc. (This is not *necessary*, but is a *really good idea* to help make your presentation more digestible. A picture is worth a thousand words!)
    - Answer:
- My presentation contains a ~2 minute participation check which is *relevant to* and *builds on* my lecture
    - Answer:
- I have my *first* exercise typed out and solved. It is relevant to my topic and at roughly the same level as a Math 157 exercise:
    - Answer:
- I have my *second* exercise typed out and solved. It is relevant to my topic and at roughly the same level as a Math 157 exercise:
    - Answer:
## *****************************************

Here are two good resources for scientific/mathematical presentations, if you want some more help:
- Terry Tao's Blog: https://terrytao.wordpress.com/career-advice/talks-are-not-the-same-as-papers/
- Gian Carlo Rota's "Ten Lessons I Wish I Had Been Taught" : https://www.ams.org/notices/199701/comm-rota.pdf

## Math 157: A Summary

Looking back on the quarter it is easy to get lost on "what we did." If you think about it, this was a course on...
- Programming
- Visualization
- Linear Algebra
- Calculus
- Discrete Mathematics
- Graph Theory
- Number Theory
- Cryptography
- Statistics
- Natural Language Processing
- Machine Learning
- ...

I hope that all of you found at least one topic that interested you. But truthfully, the topics themselves are less important than the *theme* of this class: problem solving with technology. Being computer literate in this day and age is one of the most critical skills you can leave UCSD with. Here is a real life example. 

My cousin is a lawyer in Minnesota. She spent 
- 4 years in undergrad
- 3 years in law school
- several summers interning
- several months studying for/passing the bar. 

When she got her first job at Big Law Firms R Us, she *literally* was given the task of going into a directory containing loads of law related files (precedent, case history, etc.) and *renaming the files* by hand. 

I don't know her exact pay rate, but *I can guarantee you* that Big Law Firms R Us were wasting *an absurd* amount of money paying for her to do this. After several months of this being part of her responsibility, her husband convinced her to tell IT to write a script for this. It took ~15 minutes.

## Problem Solving with Technology

Throughout this class we have been learning how to problem solve with technology. I hope that the questions were slightly more interesting than simply renaming the files in a directory, but truthfully the techniques for solving them are very similar:
- Identify the *overarching problem/goal*
    - Rename a file
    - Compute some information about an electrical network
    - Securely send information to someone
    - Visualize/find trends in a dataset
- Identify the *key hangups*
    - There are too many files to do this by hand
    - There are a lot of power stations/connections in the network (even using a computer)
    - The information needs to be sent *quickly* and there is no prior communication allowed
    - The dataset is noisy/missing some information
- Identify the *right tools/data structures* to help analyze the problem
    - Python's OS module
    - Create a discrete graph with nodes given by stations and edges given by connections
    - Perhaps we can use a key exchange to establish a symmetric key cipher; maybe RSA would work
    - Pandas to read and handle the data, ScikitLearn to analyze it?
- Read through the documentation/online resources to find out how this works in practice. Keep in mind things like:
    - Online Q&A sites (if you have a problem, someone else has probably had it before as well):
        - StackOverflow
        - MathStackExchange
        - Quora
        - CryptoStackExchange
        - CrossValidated (StatsStackExchange)
        - ...
    - Documentation
        - Even just clever Googling will lead you to the right place eventually
        - Introspection can be useful!
    - Databases
        - OEIS: http://oeis.org/
        - LMFDB: https://www.lmfdb.org/
        - ...
    - *Occasionally* go to more serious references, such as:
        - NIST Cryptographic Standards
        - Publications in ML/CS/Math
- Experiment!
    - Instead of working with files in a directory, first try renaming strings in a list
    - Start with 5% of the network, instead of the entire network
    - Play around with different values for your scheme and make sure there are no obvious weaknesses
    - Start by analyzing ~1% of your data
- Implement a *first pass* at a solution
    - os.rename()
    - find a minimal spanning tree in the graph
    - use RSA to get a symmetric key, go from there
    - impute data, plot, linear regression, etc.
- Maybe this is good enough! Maybe you need more subtle analysis for speedups, etc.
    - Optimize your code for efficiency if you need it to run over and over and over again
        - Cython
        - Profile your code: prun, lprun
        - Parallelize?
    - Maybe you cannot find a *perfect* solution.
        - Random/approximate structures and algorithms can be *very good*

## Let's be Lawyers

A useful tool to know is how to interact with the os in Python. This is given by the `os` library. This lets you use Python instead of shell in many instances.

In particular, you can iterate over the files in a directory using the `walk` function:

In [2]:
import os
for data in os.walk('.'):
    print(data)

('.', ['law'], ['Lecture25_Mar08.ipynb', '.Lecture25_Mar08.ipynb.sage-jupyter2'])
('./law', [], ['lawFile2Smith.txt', 'lawFile1Grubb.txt', 'lawFile15Bunge.txt'])


To interpret this, each tuple has:
- the current directory name
- the list of subdirectories
- the files in the current directory

Let's say we wanted to "anonymize" the law files by replacing each name with a string of XXXXXXs of the same length. So 
`lawFile1Grubb.txt` -> `lawFile1XXXXX.txt`

## ***** LAST PARTICIPATION CHECK OF 157 :( ****************
Write a function `anonymize(lawStr)` which achieves this. Be careful! The "prefix" may have varying length if the number is large!

In [7]:
def anonymize(lawStr):
    return(lawStr[1:])

## *******************************************************

*Important*: Before we actually mess up our directory, we better make sure it works how it should! Unit testing is crucial to not mess things up in the real world:

In [0]:
anonymize('lawFile1Grubb.txt')

In [0]:
anonymize('lawFile675Stevenson.txt')

Ok! Let's do it! The `os` module has a `rename` function (which is essentially just the `mv` function if you know terminal commands)

In [8]:
for data in os.walk('.'):
    if data[0] == './law':
        for file in data[2]:
            os.rename(data[0]+'/'+file,data[0]+'/'+anonymize(file))

Boom. We're all lawyers now.

## Where to go from here?

Depending on your interests, you could follow up this class with more extensive study of a particular topic. But here are some things that I think are *new* and would be fun to study up on if you just want to keep exploring:
- APIs and bots: You can't do this in CoCalc easily, but I can show you on my local machine!
    - Tweepy is a good one for twitter
- Topological Data Analysis: You can't do this in CoCalc easily, but I can show you on my local machine!
    - ripser is a good Python package
- Game development/analysis
    - pygame for game development in Python
    - alphazero/alphago/stockfish etc.
- Web development/Web apps
    - flask, jinja, etc.
- Databases
    - MongoDB, SQL, etc.