# Data Science Club Progress Journal

The purpose of this notebook is to create an interactive visualization detailing progress made and knowledge gained throughout the course of the UCSB Data Science Club.

If all of the technical skills described below are brand new to you, use the project as practice searching Google/Stackoverflow with as specific language as possible. It's really an important skill to have as any type of programmer. For example, if you can't figure out how to create a dataframe, "pandas dataframe from list of lists" is a good Google search, and "create dataframe" is not.

To begin, install all the required dependencies below. Contact a Data Science Officer if this cell doesn't work.

In [3]:
import matplotlib.pyplot as plt # Used to visualize data
import mplcursors # Adds interactive element to matplotlib charts
import pandas as pd # Ya'll know what this is 
import matplotlib.dates as mdates # Used for formatting datetimes nicely on matplotlib

## Creating Dataframe 

To begin, we need a way to organize our data. Create a list called "Columns" with the values "Date", "Lessons Completed", and "Summary". 

Next, create a list of lists called "Rows" to fill with your weekly entries. 

Create an example entry for testing purposes. 

* "Date" represents the date of the club meeting in the string format "month/day/year". 

* "Lessons Completed" will be an integer representing the number of Udemy lessons completed during that club meeting. 

* "Summary" will be a short description detailing your personal progress, giving anyone looking at your Progress Journal a deeper understanding of the progress you made on that specific day. This will be the most important column, as it shows potential employers you can think critically about the big-picture of Data Science and communicate ideas in a sensible and concise way. 

Combine your two variables "rows" and "columns" in a Pandas Dataframe, and display it in the Jupyter Notebook to ensure it's working properly. 

Check what datatypes are present in the dataframe.

For ease of analysis and plotting, convert the values in the "Date" column to a datetime object and list the dtypes to ensure the conversion worked.

## Plotting with Matplotlib

We want to visualize our dataframe in a way that is easy to understand. We'll put the date on the x-axis, and the number of courses completed on the y-axis. If we just do the most basic Matplotlib implementation, however, we get the resulting visualization:


In [16]:
%matplotlib notebook
fig, ax = plt.subplots(figsize = (10, 5)) # Just specifying figure size
plt.plot(df["Date"], # Dates on x-axis
             df["Lessons Completed"], # Number of lessons completed on y-axis
            )
plt.show()

<IPython.core.display.Javascript object>

There's quite a few things wrong with the visualization. To fix these weird quirks, we can get more specific in our parameters. 

The first issue that stands our is the ticks on the x and y axis. The y-axis is in float format, but for this type of data we assume we are working with integers (finishing half a lesson won't be represented by .5, either you finish it or not.) 

Create a variable `yrange` with the list of explicitly stated integer values that should be displayed on the y-axis.

To fix the weird alignment of the x-axis, some more obscure matplotlib datetime modules are neccessary. We set the major_locator of the xaxis to every Tuesday, since that's when the Data Science Club meets.

In [11]:
from matplotlib.dates import TU, WeekdayLocator
xrange = WeekdayLocator(byweekday=TU)

With our yrange and xrange specified, do you best to create a neater visualization with the provided info. Don't feel bad copying and pasting from the key after a few of tries, though.

Unnamed: 0,Date,Lessons Completed,Summary
0,2019-09-10,2,First day at the Data \n Science Club. I compl...
1,2019-09-17,4,Moved on to work on \n NumPy and Pandas. Joine...
2,2019-09-24,5,"Really did a lot of \n course work today, mast..."
3,2019-10-08,3,"After skipping last week, met \n back up with ..."


Line by line, copy below the `visualize_progress` function from the key, making sure you understand what each line accomplishes.

In [None]:
visualize_progress()

## Formatting Summary Annotations

If your "Summary" values are thorough enough, you are likely experiencing the issue of text being cutoff by the size of the plot. We should fix that to create a neater user experience. 

1. Write a function below that adds in a newline character ("\n") every 5 words so that the text doesn't fall off the figure. 
2. Then, use the Pandas `apply` function to apply it to the "Summary" column of your DataFrame.

Call the `visualize_progress()` function and test it, to make sure the newline function took effect.