<a href="https://colab.research.google.com/github/majadirks/nhs_teaching/blob/master/charlie_sheen.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Demo of using Desmos to graph data from a pandas DataFrame**



On March 1, 2011, around 3pm, Charlie Sheen started a Twitter account. A math teacher named Mr. Krall [decided to track how many followers he had](https://emergentmath.com/2011/03/03/can-you-predict-how-many-followers-charliesheen-has-right-now-exponential-growth-part-1/), because that's what math teachers do.


<img src="https://emergentmath.files.wordpress.com/2011/03/sheen2.png" alt="At 4:07 pm on 3/1/11, Charlie Sheen had 112497 followers." height="234" width="486"></img>

<img src="https://emergentmath.files.wordpress.com/2011/03/sheen141.png" alt="At 6:12 pm on 3/1/11, Charlie Sheen had 261894 followers.  He tweeted, 'Winning! Choose your Vice...'" height="234" width="486">


In [None]:
#@title Press the "play" button to see how many Twitter followers Charlie Sheen had at various times on March 1, 2011. (Or double-click this text to see Python code.)
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt

# Headings
MINS_ELAPSED = "Minutes since 3pm"
FOLLOWERS = "Follower Count"

# Load data of how many followers Charlie Sheen had
# since starting his Twitter account in 2011
# Source: https://emergentmath.com/2011/03/03/can-you-predict-how-many-followers-charliesheen-has-right-now-exponential-growth-part-1/

datetimes_and_follower_counts=[
# ("2011-03-01 3:00 PM", 0),  # Account created ~3pm
("2011-03-01 4:04 PM",109099),
("2011-03-01 4:07 PM",112497),
("2011-03-01 4:09 PM",116143),
("2011-03-01 4:12 PM",120195),
("2011-03-01 4:14 PM",122901),
("2011-03-01 4:18 PM",127643),
("2011-03-01 4:21 PM",129793),
("2011-03-01 4:24 PM",133357),
("2011-03-01 4:29 PM",140215),
("2011-03-01 4:32 PM",144103),
("2011-03-01 4:38 PM",149528),
("2011-03-01 4:44 PM",153848),
("2011-03-01 6:21 PM",261894),
("2011-03-01 6:24 PM",275638),
("2011-03-01 6:49 PM",302077),
("2011-03-01 7:13 PM",324595),
("2011-03-01 7:38 PM",348955),
("2011-03-01 8:45 PM",429904),
("2011-03-02 6:45 AM",729703),
("2011-03-02 8:51 AM",799008),
("2011-03-02 9:36 AM",820538),
("2011-03-02 10:23 AM",841940),
("2011-03-02 11:16 AM",865972),
("2011-03-02 12:43 PM",902291),
("2011-03-02 1:47 PM",926557),
("2011-03-02 3:09 PM",957259),
("2011-03-03 11:31 AM", 1297483),
("2011-03-04 9:43 AM", 1550073)
# As I write this code, @charliesheen has 10.5 million followers,
# but including that data point distorts the graph.
# ("2020-09-10 7:03 PM", 10500000)
]

# Parse out datetimes and follower counts
zipped_dts_and_fcs = list(zip(*datetimes_and_follower_counts))
# Parse datetime objects from strings
# dts = datetimes
dts=[datetime.strptime(x, '%Y-%m-%d %I:%M %p') for x in zipped_dts_and_fcs[0]]
fcs=zipped_dts_and_fcs[1] # follower counts
assert len(dts) == len(fcs) # Make sure we have the same number of each

# Store datetimes and follower counts in pandas Dataframe
follower_data = pd.DataFrame(data=fcs, # follower ocunts
                             index=dts, # datetimes
                             columns=[FOLLOWERS])
# Calculate minutes since 3pm on March 1st
start_time=datetime(2011, 3, 1, hour = 15, minute = 0)
minutes = (follower_data.index - start_time).total_seconds() / 60
follower_data[MINS_ELAPSED] = minutes
# Reorder columns
follower_data = follower_data[[MINS_ELAPSED, FOLLOWERS]]


# Create view for March 1 alone
march1 = follower_data.truncate(after='2011-03-02') # ie 3/1/11 only
print(march1)

# Graph the March 1 data
#plt.plot(march1[MINS_ELAPSED], march1[FOLLOWERS], 'o')
#plt.xlabel('Minutes since 3pm on 3/1/11')
#plt.ylabel(FOLLOWERS)
#plt.show()

In [None]:
#@title Press the "play" button to see that March 1 data plotted on Desmos. There is also a very bad "best fit" function; try to make it better.
import IPython

# HTML and Javascript code for interacting with Desmos
def make_html(bestfit, xs_as_strs, ys_as_strs,
              xmin = 0, 
              xmax = 200000, 
              ymin = 50000,
              ymax = 2000000):
  # bestfit: A LaTeX string representing a best fit function
  # xs: A list of LaTeX strings representing x values
  # ys: similar to xs, but ys
  # xmin, xmax, ymin, ymax: window settings for graph
  assert len(xs) == len(ys), "Dimension mismatch"
  return '''
  <script src="https://www.desmos.com/api/v1.5/calculator.js?apiKey=dcb31709b452b1cf9dc26972add0fda6"></script>
  <div id="calculator" style="width: 1200px; height: 400px;"></div>
  <script>
      var elt = document.getElementById('calculator');
      var calculator = Desmos.GraphingCalculator(elt);
      
  calculator.setExpression({ id: 'graph1', latex: \'''' + bestfit + '''\' });
  calculator.setExpression({
    type: 'table',
    columns: [
      {
        latex: 'x',
        values: ''' + str(xs_as_strs) + '''
      },
      {
        latex: 'y',
        values: ''' + str(ys_as_strs) + '''
      }
    ]
  });

  calculator.setMathBounds({
    left: 0,
    right: ''' + str(xmax) + ''',
    bottom: ''' + str(ymin) + ''',
    top: ''' + str(ymax) + ''',
  });

    </script>
  '''

# Convert x and y values to lists of strings,
# because that's what the Desmos API expects
xs = march1[MINS_ELAPSED].values
xmin = 0
xmax = max(xs) * 1.1
xs_as_strs = [f'{int(x)}' for x in xs]
ys = march1[FOLLOWERS].values
ymin = 0
ymax = max(ys) * 1.1
ys_as_strs = [f'{int(y)}' for y in ys]

# Display the HTML and thereby load Desmos
html = make_html('f(x)=1800x', xs_as_strs, ys_as_strs, xmin, xmax, ymin, ymax)
display(IPython.display.HTML(html))

1. (a) What type of function would best fit this data? Would this data be best represented as a linear function? Exponential function?

   (b) Use the graphing calculator above to estimate a best-fit function for the data. How well does your function fit the data?

Mr. Krall stopped checking Twitter around 8:45 pm, and started again the next day at 6:45 am. (Because why not?) Press the "play" button to see the data from both days.

In [None]:
#@title Press the "play" button to see how many Twitter followers Charlie Sheen had at various times on March 1 and March 2, 2011

# Create view for both March 1 and March 2
march2 = follower_data.truncate(after='2011-03-03') # ie 3/1 and 3/2

#Display data for both days
print(march2)

# Get xs and ys for Desmos
xs = march2[MINS_ELAPSED].values
xmin = 0
xmax = max(xs) * 1.1
xs_as_strs = [f'{int(x)}' for x in xs]
ys = march2[FOLLOWERS].values
ymin = 0
ymax = max(ys) * 1.1
ys_as_strs = [f'{int(y)}' for y in ys]

# Graph all the data
# plt.plot(march2[MINS_ELAPSED], march2[FOLLOWERS], 'o')
# plt.xlabel('Minutes since 3pm on 3/1/11')
# plt.ylabel(FOLLOWERS)
#plt.show()
html = make_html('f(x)=1800x', xs_as_strs, ys_as_strs, xmin, xmax, ymin, ymax)
display(IPython.display.HTML(html))

2. (a) Now that we have more data, does your answer from question 1 still seem like the best type of function to model this data? Why or why not? 

  (b) Find a "best fit" function that fits the data from both days reasonably well.

Mr. Krall later added two more data points: 

*   On March 3 at 11:31 AM, Charlie Sheen had 1,297,483 followers. 
*   On March 4 at 9:43 AM, Charlie Sheen had 1,550,073 followers



In [None]:
#@title Press the "play" button to see how many Twitter followers Charlie Sheen had from March 1 to March 4, 2011

#Display data for all days
print(follower_data)

# Get xs and ys for Desmos
xs = follower_data[MINS_ELAPSED].values
xmin = 0
xmax = max(xs) * 1.1
xs_as_strs = [f'{int(x)}' for x in xs]
ys = follower_data[FOLLOWERS].values
ymin = 0
ymax = max(ys) * 1.1
ys_as_strs = [f'{int(y)}' for y in ys]

# Graph all the data
#plt.plot(follower_data[MINS_ELAPSED], follower_data[FOLLOWERS], 'o')
#plt.xlabel('Minutes since 3pm on 3/1/11')
#plt.ylabel(FOLLOWERS)
#plt.show()
html = make_html('f(x)=1800x', xs_as_strs, ys_as_strs, xmin, xmax, ymin, ymax)
display(IPython.display.HTML(html))

2. (a) Now that we have more data, does your answer from question 2 still seem like the best type of function to model this data? Why or why not? 

  (b) Find a "best fit" function that fits the data from all four days reasonably well.

It looks like, as time goes on, Charlie Sheen gains fewer followers per day. Let's analyze the **slope** of this line over time; that might help us predict where his follower count is heading!

Let's try to create a graph of  (Change-in-followers per minute) vs. (time).

In [None]:
#@title Press the "Play" button to see how the slope (change-in-followers per minute) changes over time
DELTAS = "Change in Followers"
CHANGE_PER_MIN = "Change in Followers Per Minute"
# DELTAS = current follower count - previous follower count
follower_data[DELTAS] = follower_data[FOLLOWERS] -\
                         follower_data[FOLLOWERS].shift(1)
# CHANGE_PER_MIN = (change in followers) / 
#                       (minutes elapsed since last measurement)
follower_data[CHANGE_PER_MIN] = follower_data[DELTAS].div(
    follower_data[MINS_ELAPSED] - follower_data[MINS_ELAPSED].shift(1))
# Print the data
print(follower_data[[MINS_ELAPSED, CHANGE_PER_MIN]])


# Get xs and ys for Desmos
xs = follower_data[MINS_ELAPSED].values[1:] # Eliminate leading NaN
xmin = 0
xmax = max(xs) * 1.1 
xs_as_strs = [f'{int(x)}' for x in xs]
ys = follower_data[CHANGE_PER_MIN].values[1:] # Match xs
ymin = 0
ymax = max(ys) * 1.1
ys_as_strs = [f'{int(y)}' for y in ys]

# Plot the slopes
#plt.plot(follower_data[MINS_ELAPSED],
#         follower_data[CHANGE_PER_MIN],
#         'o')
#plt.xlabel("Minutes since 3pm on 3/1/2011")
#plt.ylabel("Change in follower count per minute")
#plt.show()

html = make_html(r'f(x)=550', xs_as_strs, ys_as_strs, xmin, xmax, ymin, ymax)
display(IPython.display.HTML(html))

3. The y-values get smaller and smaller as time goes on. Why is that? Explain what this graph "means."

4. What type of function might we use to model this data?