# MSDA Seminar Series

**Spencer Lyon**

**Week 1: 2020-05-22**

## About me

- Wife and 5 kids moved to FL about two years ago
- Physics and Economics undergrad at BYU (2013)
- Economics PhD from NYU Stern (2018)
- Core member of [quantecon](https://quantecon.org) team for last 7 years
    - Imlpemented Julia libraries + bootstrapped website
    - Lead author of the [QuantEcon datascience lectures](https://datascience.quantecon.org/)
- Authored multiple course textbooks on data analytics (with emphasis in economics) in Python and Julia
- Run a data consulting firm Valorum Data
    - Work in various industries: retail, manufacturing, power and energy, finance
    - Workflow: understand client business => gather data => build model => make better decisions => client $$ $\uparrow$
- Appointment at UCF:
    - Primarily to teach data analytics, machine learning, deep learning
    - Also contribute software/data analytics advice for research projects
- Hobbies/outside interests: 
    - Church
    - Family
    - Sports (basketball/volleyball/tennis)
    - Open source development (@sglyon on github)

## About you

- How far in program?
- What topics have you studied?
- What topics would you like to study?
- Is anybody working while doing the MSDA? If so, where?
- Internships?

## About this series

- Hands on workshop series

- Focused on **tools** and **techniques** for a robust data science workflow
    - Includes algorithms and models: supervised learning, unsupervised learning, and (perhaps) reinforcement learning
    - Programming languages: Julia, Python, R
    - Libraries: scikit-learn, tensorflow, pytorch
    - Topics: data collecting (scraping), data cleaning, visualization, modeling, automation, deployment
    - Tools: version control (git), unit testing, editing envrionments (e.g. text editors, jupyter)

- Will intersperse **instruction**, **examples**, and **projects/competitions**

- Goals:
    - Work together
    - Get hands-on experience
    - Have fun!

# Looking ahead

- Content driven by **you**, prepared by **faculty/PhD students** (me), explored **together**
- TODO (see below)

# Examples

- Let's see some examples!
- Many come from around the web
- Others are projects I have worked on or am working on

## Interactive visualization: plotly

In [None]:
import plotly_express as px
help(px.data.gapminder)
gapminder = px.data.gapminder()

# view the data
gapminder.head()

In [None]:
gapminder.describe()

In [None]:
gapminder2007 = gapminder.loc[gapminder["year"] == 2007, :]

In [None]:
# standard scatter chart
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", color="continent")

In [None]:
# add size basd on population
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", color="continent", size="pop", size_max=60)

In [None]:
# add country labels on hover
px.scatter(
    gapminder2007, 
    x="gdpPercap", y="lifeExp", color="continent", 
    size="pop", size_max=60, 
    hover_name="country"
)


In [None]:
# look at all data, not just 2007, in an animation
px.scatter(
    gapminder, 
    x="gdpPercap", y="lifeExp", color="continent",
    size="pop", size_max=60, 
    hover_name="country",
    animation_frame="year", animation_group="country",   # animation
    log_x=True, range_x=[100,100000], range_y=[25,90],  # set axis type and bounds
    labels=dict(pop="Population", gdpPercap="GDP per Capita", lifeExp="Life Expectancy")  # clean labels
)

In [None]:
# animated map
px.choropleth(
    gapminder, 
    locations="iso_alpha", color="lifeExp", hover_name="country", 
    animation_frame="year",
    color_continuous_scale=px.colors.sequential.Viridis, projection="natural earth"
)

## Julia

- New language: version 1.0 in 2018,  pre-1.0 around publically since early 2010s 
- General purposes, but targeted at numerical computing
- Advanced compiler technology (JIT) and programming language design
    - Leads to [*very efficient*](https://julialang.org/benchmarks/) code at runtime
    - But also can be easy and elegant to write
- See [other notebook](./julia.ipynb)

## Economics based examples:

- Classification
- Regression

## Need your input now

- This will be most effective if the content is driven/chosen by YOU
- Please fill out the following survey: https://forms.gle/bK2mywp4a9NmqTbp6
- Some ideas
    - Julia?
    - Version control: git and github (gitlab)
    - Deep learning (tensorflow or pytorch)
    - Data collection: web scraping or API integrations
    - 

# Resources:

- Contact info:
    - Spencer Lyon: spencer.lyon@ucf.edu
    - Some sort of chat platform? Do you already use one? Do you want one? Should we make one?
- Learning Python or Julia: [QuantEcon](https://quantecon.org)
- [Python data science handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) by Jake Vanderplas
- google! 
- Documentation for libraries and languages