- Due Thursday 11/12 no later than 9:00 a.m., send email to datascience@codeup.com
- Submit link to GitHub notebook that asks and answers questions - document the work you do to justify findings
- Compose an email with the answers to the questions/your findings, and in the email, include the link to your notebook in GitHub and attach your slide. 
- You will not present this, so be sure that the details you need your need your leader to convey/understand are clearly communicated in the email. 
- Slide should be like an exec. Summary and be in form to present. 
- Continue using best practices of acquire.py, prepare.py, etc. 
- No modeling to be done, and no need to split the data into train/validate/test. 
- alumni.codeup.com has info about cohorts/dates/names

- 1. Which lesson appears to attract the most traffic consistently across cohorts (per program)?
- 2. Is there a cohort that referred to a lesson significantly more that other cohorts seemed to gloss over? 
- 3. Are there students who, when active, hardly access the curriculum? If so, what information do you have about these students? 
- 4. Is there any suspicious activity, such as users/machines/etc accessing the curriculum who shouldn’t be? Does it appear that any web-scraping is happening? Are there any suspicious IP addresses? Any odd user-agents? 
- 5. At some point in the last year, ability for students and alumni to cross-access curriculum (web dev to ds, ds to web dev) should have been shut off. Do you see any evidence of that happening? Did it happen before? 
- 6. What topics are grads continuing to reference after graduation and into their jobs (for each program)? 
- 7. Which lessons are least accessed? 
- 8. Anything else I should be aware of? 

In [1]:
import numpy as np
import pandas as pd
import math
from sklearn import metrics

from scipy.stats import entropy

import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
import matplotlib.dates as mdates #to format dates on our plots
%matplotlib inline
import seaborn as sns

In [2]:
df = pd.read_csv('curriculum.txt',
                engine='python',
                 header=None,
                 index_col=False,
                 sep=r'\s(?=(?:[^"]*"[^"]*")*[^"]*$)(?![^\[]*\])',
                 na_values='"-"',)

In [3]:
df.head()

Unnamed: 0,0,1,2,3,4,5
0,2018-01-26,09:55:03,/,1,8.0,97.105.19.61
1,2018-01-26,09:56:02,java-ii,1,8.0,97.105.19.61
2,2018-01-26,09:56:05,java-ii/object-oriented-programming,1,8.0,97.105.19.61
3,2018-01-26,09:56:06,slides/object_oriented_programming,1,8.0,97.105.19.61
4,2018-01-26,09:56:24,javascript-i/conditionals,2,22.0,97.105.19.61


In [4]:
df.columns = ['date', 'time', 'page_viewed', 'user_id', 'cohort_id', 'ip']

In [5]:
df.head()

Unnamed: 0,date,time,page_viewed,user_id,cohort_id,ip
0,2018-01-26,09:55:03,/,1,8.0,97.105.19.61
1,2018-01-26,09:56:02,java-ii,1,8.0,97.105.19.61
2,2018-01-26,09:56:05,java-ii/object-oriented-programming,1,8.0,97.105.19.61
3,2018-01-26,09:56:06,slides/object_oriented_programming,1,8.0,97.105.19.61
4,2018-01-26,09:56:24,javascript-i/conditionals,2,22.0,97.105.19.61


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 719459 entries, 0 to 719458
Data columns (total 6 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   date         719459 non-null  object 
 1   time         719459 non-null  object 
 2   page_viewed  719458 non-null  object 
 3   user_id      719459 non-null  int64  
 4   cohort_id    674619 non-null  float64
 5   ip           719459 non-null  object 
dtypes: float64(1), int64(1), object(4)
memory usage: 32.9+ MB


In [7]:
no_cohort_df = df[df['cohort_id'].isnull()]

In [8]:
no_cohort_df.head()

Unnamed: 0,date,time,page_viewed,user_id,cohort_id,ip
411,2018-01-26,16:46:16,/,48,,97.105.19.61
412,2018-01-26,16:46:24,spring/extra-features/form-validation,48,,97.105.19.61
425,2018-01-26,17:54:24,/,48,,97.105.19.61
435,2018-01-26,18:32:03,/,48,,97.105.19.61
436,2018-01-26,18:32:17,mysql/relationships/joins,48,,97.105.19.61


**Drop rows with null values and turn**

In [9]:
df.dropna(inplace=True)
# df.cohort_id = df.cohort_id.astype('int')
# df['date'] = df.date + " " + df.time
# df.drop(columns=('time'), inplace=True)
# df.date = pd.to_datetime(df.date)
# df = df.set_index('date')

In [10]:
df.head()

Unnamed: 0,date,time,page_viewed,user_id,cohort_id,ip
0,2018-01-26,09:55:03,/,1,8.0,97.105.19.61
1,2018-01-26,09:56:02,java-ii,1,8.0,97.105.19.61
2,2018-01-26,09:56:05,java-ii/object-oriented-programming,1,8.0,97.105.19.61
3,2018-01-26,09:56:06,slides/object_oriented_programming,1,8.0,97.105.19.61
4,2018-01-26,09:56:24,javascript-i/conditionals,2,22.0,97.105.19.61


In [11]:
cohort = pd.read_csv('cohort_name.csv')

In [12]:
cohort.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6
0,,cohort_id,name,start_date,end_date,program_id,
1,,1,Arches,2014-02-04,2014-04-22,1,
2,,2,Badlands,2014-06-04,2014-08-22,1,
3,,3,Carlsbad,2014-09-04,2014-11-05,1,
4,,4,Denali,2014-10-20,2015-01-18,1,


In [46]:
cohort[cohort.name == 'Ada']

Unnamed: 0,cohort_id,name,start_date,end_date
30,30,Ada,2019-02-04,2019-06-16


In [13]:
cohort.columns = cohort.iloc[0]

In [14]:
cohort.head()

Unnamed: 0,NaN,cohort_id,name,start_date,end_date,program_id,NaN.1
0,,cohort_id,name,start_date,end_date,program_id,
1,,1,Arches,2014-02-04,2014-04-22,1,
2,,2,Badlands,2014-06-04,2014-08-22,1,
3,,3,Carlsbad,2014-09-04,2014-11-05,1,
4,,4,Denali,2014-10-20,2015-01-18,1,


In [15]:
cohort = cohort.iloc[1:]

In [16]:
cohort.head()

Unnamed: 0,NaN,cohort_id,name,start_date,end_date,program_id,NaN.1
1,,1,Arches,2014-02-04,2014-04-22,1,
2,,2,Badlands,2014-06-04,2014-08-22,1,
3,,3,Carlsbad,2014-09-04,2014-11-05,1,
4,,4,Denali,2014-10-20,2015-01-18,1,
5,,5,Everglades,2014-11-18,2015-02-24,1,


In [17]:
cohort = cohort[['cohort_id', 'name', 'start_date', 'end_date']]

In [18]:
cohort.head()

Unnamed: 0,cohort_id,name,start_date,end_date
1,1,Arches,2014-02-04,2014-04-22
2,2,Badlands,2014-06-04,2014-08-22
3,3,Carlsbad,2014-09-04,2014-11-05
4,4,Denali,2014-10-20,2015-01-18
5,5,Everglades,2014-11-18,2015-02-24


In [19]:
cohort.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 1 to 46
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   cohort_id   46 non-null     object
 1   name        46 non-null     object
 2   start_date  46 non-null     object
 3   end_date    46 non-null     object
dtypes: object(4)
memory usage: 1.6+ KB


In [20]:
cohort.cohort_id = cohort.cohort_id.astype('int')


In [49]:
df.cohort_id.value_counts()

28.0    60315
33.0    40168
29.0    37548
53.0    36047
24.0    35624
57.0    32447
56.0    31670
22.0    30328
51.0    29688
58.0    28354
32.0    28333
23.0    28329
26.0    27637
52.0    27518
25.0    25427
31.0    25253
34.0    25181
59.0    22425
27.0    20447
55.0    20410
61.0    11774
14.0     9495
1.0      8884
62.0     8718
21.0     7444
17.0     4925
13.0     2733
18.0     2005
8.0      1712
19.0     1165
16.0      743
15.0      691
7.0       495
12.0      302
11.0      208
2.0        93
6.0        72
9.0         5
4.0         4
5.0         1
Name: cohort_id, dtype: int64

In [21]:
result = pd.merge(df, cohort, on='cohort_id')

In [22]:
result.head()

Unnamed: 0,date,time,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
0,2018-01-26,09:55:03,/,1,8.0,97.105.19.61,Hampton,2015-09-22,2016-02-06
1,2018-01-26,09:56:02,java-ii,1,8.0,97.105.19.61,Hampton,2015-09-22,2016-02-06
2,2018-01-26,09:56:05,java-ii/object-oriented-programming,1,8.0,97.105.19.61,Hampton,2015-09-22,2016-02-06
3,2018-01-26,09:56:06,slides/object_oriented_programming,1,8.0,97.105.19.61,Hampton,2015-09-22,2016-02-06
4,2018-01-26,10:40:15,javascript-i/functions,1,8.0,97.105.19.61,Hampton,2015-09-22,2016-02-06


In [23]:
result.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 674618 entries, 0 to 674617
Data columns (total 9 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   date         674618 non-null  object 
 1   time         674618 non-null  object 
 2   page_viewed  674618 non-null  object 
 3   user_id      674618 non-null  int64  
 4   cohort_id    674618 non-null  float64
 5   ip           674618 non-null  object 
 6   name         674618 non-null  object 
 7   start_date   674618 non-null  object 
 8   end_date     674618 non-null  object 
dtypes: float64(1), int64(1), object(7)
memory usage: 51.5+ MB


In [24]:
result.cohort_id = result.cohort_id.astype('int')
result['date'] = result.date + " " + result.time
result.drop(columns=('time'), inplace=True)
result.date = pd.to_datetime(result.date)
result = result.set_index('date')

In [25]:
result.head()

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-01-26 09:55:03,/,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06
2018-01-26 09:56:02,java-ii,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06
2018-01-26 09:56:05,java-ii/object-oriented-programming,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06
2018-01-26 09:56:06,slides/object_oriented_programming,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06
2018-01-26 10:40:15,javascript-i/functions,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06


In [27]:
result.name.value_counts()

Staff         60315
Ceres         40168
Zion          37548
Fortuna       36047
Voyageurs     35624
Ganymede      32447
Apex          31670
Teddy         30328
Deimos        29688
Hyperion      28354
Betelgeuse    28333
Ulysses       28329
Xanadu        27637
Europa        27518
Wrangell      25427
Andromeda     25253
Bayes         25181
Darden        22425
Yosemite      20447
Curie         20410
Bash          11774
Lassen         9495
Arches         8884
Jupiter        8718
Sequoia        7444
Olympic        4925
Kings          2733
Pinnacles      2005
Hampton        1712
Quincy         1165
Niagara         743
Mammoth         691
Glacier         495
Joshua          302
Ike             208
Badlands         93
Franklin         72
Apollo            5
Denali            4
Everglades        1
Name: name, dtype: int64

In [26]:
pd.crosstab(result.name, result.page_viewed)

page_viewed,%20https://github.com/RaulCPena,",%20https://github.com/RaulCPena",.git,.gitignore,.well-known/assetlinks.json,/,00_,00_index,01_intro,02_listing_files,...,web-design/ui/typography,web-design/ui/visuals,web-design/ux,web-design/ux/layout,web-design/ux/layout/.json,web-design/ux/purpose,web-dev-day-two,working-with-time-series-data,wp-admin,wp-login
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Andromeda,0,0,0,0,0,1156,0,0,0,0,...,9,11,0,6,0,8,0,0,0,0
Apex,0,0,0,0,0,1244,0,0,0,0,...,20,19,0,17,0,22,0,0,0,0
Apollo,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Arches,0,0,0,0,0,622,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Badlands,0,0,0,0,0,17,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Bash,0,0,0,0,0,532,0,0,0,0,...,0,1,0,1,0,0,2,0,0,0
Bayes,0,0,0,0,0,1842,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Betelgeuse,0,0,0,0,0,868,0,0,0,0,...,26,30,0,38,0,32,0,0,0,0
Ceres,0,0,0,0,0,1620,0,0,0,0,...,33,34,0,34,1,28,0,0,0,0
Curie,1,1,0,0,0,1523,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [41]:
data_science = result[(result.name == 'Curie') | (result.name == 'Bayes') | (result.name == 'Ada') | (result.name == 'Darden')]

In [42]:
data_science.head()

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-08-20 09:39:58,/,466,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:39:59,/,467,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:39:59,/,468,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:40:02,/,469,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:40:08,/,470,34,97.105.19.58,Bayes,2019-08-19,2020-01-30


In [30]:
data_science.shape

(68016, 7)

In [35]:
web_dev = result[(result.name != 'Curie') & (result.name != 'Bayes') & (result.name != 'Ada') & (result.name != 'Darden')]

In [36]:
web_dev.shape

(606602, 7)

In [37]:
web_dev.name.value_counts()

Staff         60315
Ceres         40168
Zion          37548
Fortuna       36047
Voyageurs     35624
Ganymede      32447
Apex          31670
Teddy         30328
Deimos        29688
Hyperion      28354
Betelgeuse    28333
Ulysses       28329
Xanadu        27637
Europa        27518
Wrangell      25427
Andromeda     25253
Yosemite      20447
Bash          11774
Lassen         9495
Arches         8884
Jupiter        8718
Sequoia        7444
Olympic        4925
Kings          2733
Pinnacles      2005
Hampton        1712
Quincy         1165
Niagara         743
Mammoth         691
Glacier         495
Joshua          302
Ike             208
Badlands         93
Franklin         72
Apollo            5
Denali            4
Everglades        1
Name: name, dtype: int64

In [50]:
pd.crosstab(data_science.name, data_science.page_viewed)

In [43]:
data_science.name.value_counts()

Bayes     25181
Darden    22425
Curie     20410
Name: name, dtype: int64

In [45]:
result[result.name == 'Ada']

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1


In [60]:
data_science.groupby(['name','user_id']).page_viewed.value_counts()

name    user_id  page_viewed                                      
Bayes   358      search/search_index.json                             63
                 6-regression/1-overview                              28
                 10-anomaly-detection/1-overview                      22
                 10-anomaly-detection/AnomalyDetectionCartoon.jpeg    22
                 5-stats/3-probability-distributions                  19
                                                                      ..
Darden  785      sql/mysql-overview                                    1
                 timeseries/modeling-lesson1                           1
                 timeseries/prep                                       1
                 timeseries/project                                    1
                 timeseries/working-with-time-series-data              1
Name: page_viewed, Length: 10383, dtype: int64

# 1. Which lesson appears to attract the most traffic consistently across cohorts (per program)?


In [61]:
bayes = data_science[data_science.name == 'Bayes']

In [79]:
bayes.page_viewed.value_counts().head(20)

/                                                    1842
1-fundamentals/modern-data-scientist.jpg              626
1-fundamentals/AI-ML-DL-timeline.jpg                  624
1-fundamentals/1.1-intro-to-data-science              615
search/search_index.json                              551
6-regression/1-overview                               521
10-anomaly-detection/AnomalyDetectionCartoon.jpeg     386
10-anomaly-detection/1-overview                       383
6-regression/5.0-evaluate                             333
5-stats/3-probability-distributions                   320
5-stats/4.2-compare-means                             315
appendix/cli-git-overview                             311
6-regression/7.0-model                                310
6-regression/4.0-explore                              267
6-regression/3.0-split-and-scale                      260
7-classification/3-prep                               256
4-python/7.4.3-dataframes                             251
7-classificati

In [71]:
darden = data_science[data_science.name == 'Darden']

In [75]:
darden.page_viewed.value_counts().head(20)

/                                           2041
classification/overview                      759
classification/scale_features_or_not.svg     590
sql/mysql-overview                           513
1-fundamentals/modern-data-scientist.jpg     470
1-fundamentals/AI-ML-DL-timeline.jpg         470
1-fundamentals/1.1-intro-to-data-science     460
stats/compare-means                          338
classification/logistic-regression           334
classification/prep                          321
search/search_index.json                     300
1-fundamentals/DataToAction_v2.jpg           284
classification/explore                       282
classification/evaluation                    280
1-fundamentals/1.2-data-science-pipeline     271
classification/project                       252
classification/acquire                       252
stats/probability-distributions              246
python/data-types-and-variables              235
stats/correlation                            234
Name: page_viewed, d

In [76]:
curie = data_science[data_science.name == 'Curie']

In [77]:
curie.page_viewed.value_counts().head(20)

/                                                    1523
6-regression/1-overview                               595
search/search_index.json                              480
1-fundamentals/modern-data-scientist.jpg              467
1-fundamentals/AI-ML-DL-timeline.jpg                  465
1-fundamentals/1.1-intro-to-data-science              461
3-sql/1-mysql-overview                                441
10-anomaly-detection/AnomalyDetectionCartoon.jpeg     345
10-anomaly-detection/1-overview                       345
4-python/8.4.3-dataframes                             260
4-python/8.4.4-advanced-dataframes                    246
4-python/3-data-types-and-variables                   234
4-python/5-functions                                  203
5-stats/4.2-compare-means                             197
5-stats/2-simulation                                  193
appendix/cli-git-overview                             190
3-sql/7-functions                                     185
7-classificati

In [80]:
bayes.head()

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-08-20 09:39:58,/,466,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:39:59,/,467,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:39:59,/,468,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:40:02,/,469,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:40:08,/,470,34,97.105.19.58,Bayes,2019-08-19,2020-01-30


In [98]:
web_dev.name.value_counts()

Staff         60315
Ceres         40168
Zion          37548
Fortuna       36047
Voyageurs     35624
Ganymede      32447
Apex          31670
Teddy         30328
Deimos        29688
Hyperion      28354
Betelgeuse    28333
Ulysses       28329
Xanadu        27637
Europa        27518
Wrangell      25427
Andromeda     25253
Yosemite      20447
Bash          11774
Lassen         9495
Arches         8884
Jupiter        8718
Sequoia        7444
Olympic        4925
Kings          2733
Pinnacles      2005
Hampton        1712
Quincy         1165
Niagara         743
Mammoth         691
Glacier         495
Joshua          302
Ike             208
Badlands         93
Franklin         72
Apollo            5
Denali            4
Everglades        1
Name: name, dtype: int64

In [96]:
ceres = web_dev[web_dev.name == 'Ceres']

In [97]:
ceres.page_viewed.value_counts().head(20)

/                                                                            1620
search/search_index.json                                                     1376
javascript-i                                                                  977
toc                                                                           909
html-css                                                                      753
java-iii                                                                      674
java-ii                                                                       667
jquery                                                                        632
mysql                                                                         617
spring                                                                        546
javascript-ii                                                                 519
java-i                                                                        510
html-css/css-i/f

In [99]:
zion = web_dev[web_dev.name == 'Zion']

In [101]:
zion.page_viewed.value_counts().head(20)

/                                                                            1756
toc                                                                          1457
javascript-i                                                                  868
java-iii                                                                      742
search/search_index.json                                                      689
spring                                                                        650
html-css                                                                      649
javascript-ii                                                                 637
java-ii                                                                       613
mysql                                                                         598
java-i                                                                        593
jquery                                                                        559
spring/fundament

In [102]:
fortuna = web_dev[web_dev.name == 'Fortuna']

In [103]:
fortuna.page_viewed.value_counts().head(20)

/                                    1962
toc                                  1273
search/search_index.json              989
java-iii                              767
javascript-i                          756
java-ii                               637
spring                                616
html-css                              578
mysql                                 571
java-i                                538
jquery                                501
javascript-ii                         482
java-iii/servlets                     416
java-iii/jsp-and-jstl                 402
mysql/tables                          373
java-i/syntax-types-and-variables     358
java-i/introduction-to-java           356
mysql/basic-statements                348
spring/fundamentals/controllers       343
appendix                              337
Name: page_viewed, dtype: int64

In [104]:
voyageurs = web_dev[web_dev.name == 'Voyageurs']

In [109]:
voyageurs.page_viewed.value_counts().head(20)

/                                      2098
javascript-i                            884
java-iii                                770
java-ii                                 754
mysql                                   663
spring                                  650
java-i                                  639
javascript-ii                           584
jquery                                  583
html-css                                528
java-i/introduction-to-java             447
mysql/databases                         439
mysql/tables                            437
mysql/users                             410
java-iii/servlets                       397
appendix                                393
mysql/basic-statements                  384
java-ii/object-oriented-programming     380
java-iii/jsp-and-jstl                   367
javascript-i/javascript-with-html       354
Name: page_viewed, dtype: int64

In [107]:
ganymede = web_dev[web_dev.name == 'Ganymede']

In [110]:
ganymede.page_viewed.value_counts().head(20)

/                                      1618
search/search_index.json               1050
toc                                     810
javascript-i                            694
java-iii                                620
java-ii                                 607
appendix                                526
jquery                                  496
javascript-ii                           483
java-i                                  482
mysql                                   474
html-css                                472
spring                                  465
java-iii/servlets                       328
java-i/syntax-types-and-variables       317
java-iii/jsp-and-jstl                   308
java-ii/collections                     302
java-ii/object-oriented-programming     298
java-ii/arrays                          286
mysql/databases                         282
Name: page_viewed, dtype: int64

**for data science it looks like fundamentals is the most common. For web dev it looks like java and javascript are the most common.**

# - 2. Is there a cohort that referred to a lesson significantly more that other cohorts seemed to gloss over? 

**Curie appeared to access the Python modules significantly more than other data science cohorts**

In [111]:
apex = web_dev[web_dev.name == 'Apex']

In [112]:
apex.page_viewed.value_counts().head(20)

search/search_index.json               1361
/                                      1244
toc                                     799
html-css                                708
java-iii                                616
javascript-i                            610
java-ii                                 595
spring                                  572
java-i                                  528
mysql                                   445
jquery                                  437
javascript-ii                           431
appendix                                426
java-i/syntax-types-and-variables       346
java-iii/servlets                       338
java-ii/object-oriented-programming     334
mysql/tables                            332
java-ii/arrays                          308
mysql/databases                         306
java-ii/collections                     294
Name: page_viewed, dtype: int64

In [113]:
teddy = web_dev[web_dev.name == 'Teddy']

In [114]:
teddy.page_viewed.value_counts().head(20)

/                                   1754
java-iii                             712
spring                               707
mysql                                631
mkdocs/search_index.json             595
javascript-i                         589
java-i                               501
jquery                               498
java-ii                              494
appendix                             488
javascript-ii                        468
mysql/tables                         387
mysql/databases                      382
mysql/basic-statements               360
javascript-i/functions               360
mysql/users                          356
javascript-i/loops                   337
javascript-i/conditionals            337
javascript-ii/promises               311
spring/fundamentals/repositories     287
Name: page_viewed, dtype: int64

In [115]:
deimos = web_dev[web_dev.name == 'Deimos']

In [116]:
deimos.page_viewed.value_counts().head(20)

/                                              1319
javascript-i                                    696
search/search_index.json                        662
html-css                                        609
toc                                             569
java-iii                                        517
spring                                          512
java-ii                                         506
mysql                                           498
jquery                                          466
java-i                                          434
javascript-ii                                   392
mysql/tables                                    371
mysql/databases                                 359
java-iii/jsp-and-jstl                           349
mysql/users                                     342
java-iii/servlets                               326
html-css/css-ii/bootstrap-introduction          325
html-css/css-i/flexbox/flexbox-fundamentals     324
mysql/basic-

In [117]:
hyperion = web_dev[web_dev.name == 'Hyperion']

In [119]:
hyperion.page_viewed.value_counts().head(20)

/                                                                            1245
toc                                                                           977
javascript-i                                                                  884
java-iii                                                                      651
java-ii                                                                       632
search/search_index.json                                                      630
mysql                                                                         517
jquery                                                                        490
java-i                                                                        444
spring                                                                        442
html-css                                                                      418
javascript-ii                                                                 410
html-css/css-i/s

Betelgeuse    28333
Ulysses       28329
Xanadu        27637
Europa        27518
Wrangell      25427
Andromeda     25253
Yosemite      20447
Bash          11774
Lassen         9495
Arches         8884
Jupiter        8718
Sequoia        7444
Olympic        4925
Kings          2733
Pinnacles      2005
Hampton        1712
Quincy         1165
Niagara         743
Mammoth         691
Glacier         495
Joshua          302
Ike             208
Badlands         93
Franklin         72
Apollo            5
Denali            4
Everglades        1

In [120]:
betelgeuse = web_dev[web_dev.name == 'Betelgeuse']

In [121]:
betelgeuse.page_viewed.value_counts().head(20)

/                                                                            868
search/search_index.json                                                     718
javascript-i                                                                 686
toc                                                                          577
jquery                                                                       540
html-css/elements                                                            472
java-ii                                                                      444
java-iii                                                                     444
html-css                                                                     425
java-i                                                                       422
html-css/css-ii/bootstrap-grid-system                                        414
javascript-ii                                                                374
javascript-i/javascript-with

In [122]:
ulysses = web_dev[web_dev.name == 'Ulysses']

In [123]:
ulysses.page_viewed.value_counts().head(20)

/                                                                            1618
mkdocs/search_index.json                                                      721
html-css                                                                      555
javascript-i                                                                  523
java-ii                                                                       423
java-iii                                                                      411
spring                                                                        396
java-i                                                                        375
jquery                                                                        363
mysql                                                                         361
spring/fundamentals/form-model-binding                                        357
html-css/css-ii/bootstrap-introduction                                        351
mysql/users     

In [124]:
xanadu = web_dev[web_dev.name == 'Xanadu']

In [125]:
xanadu.page_viewed.value_counts().head(20)

/                                    916
javascript-i                         718
html-css                             587
search/search_index.json             576
jquery                               543
mysql                                507
java-ii                              501
java-iii                             495
javascript-ii                        493
java-i                               483
spring                               453
toc                                  435
html-css/elements                    433
mysql/tables                         373
appendix                             320
mysql/databases                      318
javascript-i/functions               305
mysql/basic-statements               301
java-iii/jsp-and-jstl                297
java-i/syntax-types-and-variables    291
Name: page_viewed, dtype: int64

In [126]:
europa = web_dev[web_dev.name == 'Europa']

In [127]:
europa.page_viewed.value_counts().head(20)

/                                                                            1223
toc                                                                           949
search/search_index.json                                                      734
javascript-i                                                                  478
java-iii                                                                      443
html-css/elements                                                             422
mysql                                                                         367
java-ii                                                                       366
jquery                                                                        360
html-css/css-i/selectors-and-properties                                       351
html-css                                                                      329
mysql/tables                                                                  328
mysql/databases 

In [128]:
result.head()

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-01-26 09:55:03,/,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06
2018-01-26 09:56:02,java-ii,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06
2018-01-26 09:56:05,java-ii/object-oriented-programming,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06
2018-01-26 09:56:06,slides/object_oriented_programming,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06
2018-01-26 10:40:15,javascript-i/functions,1,8,97.105.19.61,Hampton,2015-09-22,2016-02-06


In [130]:
pd.crosstab(web_dev.page_viewed, web_dev.name)

name,Andromeda,Apex,Apollo,Arches,Badlands,Bash,Betelgeuse,Ceres,Deimos,Denali,...,Quincy,Sequoia,Staff,Teddy,Ulysses,Voyageurs,Wrangell,Xanadu,Yosemite,Zion
page_viewed,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
.git,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
.gitignore,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
.well-known/assetlinks.json,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
/,1156,1244,1,622,17,532,868,1620,1319,1,...,138,630,4633,1754,1618,2098,1112,916,962,1756
00_,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
web-design/ux/purpose,8,22,0,0,0,0,32,28,11,0,...,0,1,13,0,0,1,1,2,2,1
web-dev-day-two,0,0,0,0,0,2,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
working-with-time-series-data,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
wp-admin,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


Unnamed: 0_level_0,Unnamed: 1_level_0,user_id,cohort_id,ip,start_date,end_date
name,page_viewed,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Andromeda,/,464,31,99.203.26.208,2019-03-18,2019-07-30
Andromeda,1-fundamentals/1.1-intro-to-data-science,373,31,173.174.220.17,2019-03-18,2019-07-30
Andromeda,1-fundamentals/1.2-data-science-pipeline,373,31,173.174.220.17,2019-03-18,2019-07-30
Andromeda,1-fundamentals/1.3-pipeline-demo,373,31,173.174.220.17,2019-03-18,2019-07-30
Andromeda,1-fundamentals/AI-ML-DL-timeline.jpg,373,31,173.174.220.17,2019-03-18,2019-07-30
...,...,...,...,...,...,...
Zion,uploads/598dc43df39e2.jpg,333,29,173.173.102.182,2019-01-22,2019-06-04
Zion,web-design/intro,344,29,167.24.104.150,2019-01-22,2019-06-04
Zion,web-design/ui/visuals,336,29,72.181.99.44,2019-01-22,2019-06-04
Zion,web-design/ux/layout,336,29,72.181.99.44,2019-01-22,2019-06-04


In [135]:
wrangell =  web_dev[web_dev.name == 'Wrangell']

In [136]:
wrangell.page_viewed.value_counts().head(20)

/                                                                            1112
toc                                                                           990
javascript-i                                                                  553
search/search_index.json                                                      489
html-css                                                                      405
java-i                                                                        401
jquery                                                                        374
java-iii                                                                      335
java-ii                                                                       323
javascript-i/functions                                                        312
javascript-i/loops                                                            312
html-css/css-ii/bootstrap-grid-system                                         312
html-css/css-ii/

In [137]:
andromeda = web_dev[web_dev.name == 'Andromeda']

In [138]:
andromeda.page_viewed.value_counts().head(20)

/                                                                            1156
toc                                                                           637
javascript-i                                                                  509
spring                                                                        485
java-iii                                                                      430
java-ii                                                                       395
html-css                                                                      372
mysql                                                                         346
java-i                                                                        336
jquery                                                                        335
search/search_index.json                                                      318
mysql/tables                                                                  316
spring/fundament

In [139]:
yosemite = web_dev[web_dev.name == 'Yosemite']

In [140]:
yosemite.page_viewed.value_counts().head(20)

/                                        962
toc                                      686
javascript-i                             437
spring                                   374
html-css                                 362
search/search_index.json                 361
java-iii                                 340
javascript-ii                            332
mysql                                    318
javascript-ii/map-filter-reduce          298
jquery                                   278
mysql/tables                             277
java-ii                                  268
javascript-ii/promises                   249
javascript-i/loops                       248
java-i                                   245
html-css/css-ii/bootstrap-grid-system    242
mysql/users                              232
javascript-i/functions                   230
javascript-i/javascript-with-html        227
Name: page_viewed, dtype: int64

**Not seeing any variance in web dev**

# Are there students who, when active, hardly access the curriculum? If so, what information do you have about these students?

In [142]:
data_science.head()

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-08-20 09:39:58,/,466,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:39:59,/,467,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:39:59,/,468,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:40:02,/,469,34,97.105.19.58,Bayes,2019-08-19,2020-01-30
2019-08-20 09:40:08,/,470,34,97.105.19.58,Bayes,2019-08-19,2020-01-30


In [145]:
darden.user_id.value_counts() < 20

685    False
698    False
689    False
699    False
681    False
692    False
688    False
691    False
682    False
678    False
696    False
684    False
680    False
268    False
687    False
686    False
695    False
690    False
694    False
739    False
693    False
683    False
781    False
783    False
780    False
785    False
697     True
679     True
Name: user_id, dtype: bool

In [146]:
user1 = darden[darden.user_id == 697]

In [148]:
user1.shape

(13, 7)

In [149]:
user1

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-13 15:20:27,/,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12
2020-07-13 15:20:48,3-sql/1-mysql-overview,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12
2020-07-13 15:20:50,1-fundamentals/1.1-intro-to-data-science,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12
2020-07-13 15:20:50,1-fundamentals/modern-data-scientist.jpg,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12
2020-07-13 15:20:50,1-fundamentals/AI-ML-DL-timeline.jpg,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12
2020-07-13 15:20:59,1-fundamentals/1.2-data-science-pipeline,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12
2020-07-13 15:20:59,1-fundamentals/DataToAction_v2.jpg,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12
2020-07-13 15:21:01,1-fundamentals/1.1-intro-to-data-science,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12
2020-07-13 15:21:02,1-fundamentals/AI-ML-DL-timeline.jpg,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12
2020-07-13 15:21:02,1-fundamentals/modern-data-scientist.jpg,697,59,136.50.70.27,Darden,2020-07-13,2021-01-12


In [150]:
user2 = darden[darden.user_id == 679]

In [151]:
user2.shape

(11, 7)

In [152]:
user2

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-13 14:37:22,/,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12
2020-07-13 14:39:21,13-advanced-topics/1-tidy-data,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12
2020-07-13 14:39:36,1-fundamentals/1.1-intro-to-data-science,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12
2020-07-13 14:39:37,1-fundamentals/AI-ML-DL-timeline.jpg,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12
2020-07-13 14:39:37,1-fundamentals/modern-data-scientist.jpg,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12
2020-07-13 15:49:31,1-fundamentals/1.1-intro-to-data-science,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12
2020-07-13 15:49:32,1-fundamentals/modern-data-scientist.jpg,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12
2020-07-13 15:49:32,1-fundamentals/AI-ML-DL-timeline.jpg,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12
2020-07-14 08:05:15,1-fundamentals/1.1-intro-to-data-science,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12
2020-07-14 08:05:15,1-fundamentals/AI-ML-DL-timeline.jpg,679,59,24.28.146.155,Darden,2020-07-13,2021-01-12


In [153]:
curie.user_id.value_counts() < 20

581    False
576    False
590    False
584    False
580    False
582    False
579    False
585    False
586    False
589    False
617    False
591    False
578    False
588    False
616    False
575    False
587    False
583    False
577    False
746     True
787     True
Name: user_id, dtype: bool

In [154]:
user3 = curie[curie.user_id == 746]

In [155]:
user3.shape

(1, 7)

In [156]:
user3

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-09-10 10:50:28,/,746,55,173.175.100.201,Curie,2020-02-03,2020-07-07


In [158]:
user4 = curie[curie.user_id == 787]

In [159]:
user4

Unnamed: 0_level_0,page_viewed,user_id,cohort_id,ip,name,start_date,end_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-10-29 22:08:03,appendix/interview_questions_students,787,55,99.126.113.140,Curie,2020-02-03,2020-07-07


In [160]:
bayes.user_id.value_counts() < 20

485    False
475    False
476    False
479    False
478    False
482    False
471    False
469    False
466    False
473    False
481    False
358    False
484    False
480    False
483    False
472    False
468    False
474    False
467    False
470    False
477    False
487     True
650     True
Name: user_id, dtype: bool

In [None]:
user5 = curie[bayes.user_id == 787]