# Summary

This notebook loops through the 5 phases in the Flatiron Curriculum and pulls the github links for any lessons where there is one.

Note that this will only work if you have access to a curriculum.

In [127]:
import pandas as pd
from bs4 import BeautifulSoup
import json
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import urllib

In [131]:
# list of URLs representing the phase main pages

phases = {
    'phase1': 'https://learning.flatironschool.com/courses/2680',
    'phase2': 'https://learning.flatironschool.com/courses/2681',
    'phase3': 'https://learning.flatironschool.com/courses/2682',
    'phase4': 'https://learning.flatironschool.com/courses/2683',
    'phase5': 'https://learning.flatironschool.com/courses/2684',
}

lesson_root = 'https://learning.flatironschool.com'

results = []

Be sure to download the latest Chrome web driver and extract it somewhere locally.

**Replace the path below with the path to the folder containing the extracted Chrome webdriver before proceeding.**

In [160]:
## REPLACE THE PATH WITH YOUR OWN ##
driver_path = '/Users/jessicamiles/Downloads/chromedriver'

# initiate webdriver. This should launch a Chrome window controlled by Selenium
driver = webdriver.Chrome(driver_path)

A Chrome window should have launched. Log in to Flatiron using your own credentials. You don't need to navigate anywhere else after that; the next cell of code will take it from there.

In [132]:
# get Selenium window to load Flatiron Login
# log in using your credentials
login_url = 'https://portal.flatironschool.com'
driver.get(login_url)

In [136]:
# loop through phase home pages, topics, and lessons, and get github links
for phase in phases.keys():
    
    # get phase module main page
    driver.get(phases[phase])
    
    # make initial soup out of the phase page
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    
    # get main modules div
    mods = soup.find(id='context_modules')

    # loop through each topic
    for topic in mods.find_all(class_='context_module'):

        # loop through each lesson listed in the topic
        for lesson in topic.find_all(class_='ig-row'):

            if lesson.a:

                # navigate to lesson page to find if it has a repo link
                driver.get(lesson_root + lesson.a['href'])
                curr_url = driver.current_url

                soup_sub = BeautifulSoup(driver.page_source, 'html.parser')

                # try to find the image to open the GitHub repo, if exists
                repos = soup_sub.find_all('a', class_='fis-git-link', limit=1)
                if len(repos) > 0:
                    repo = repos[0]['href']
                else:
                    repo = ''

                # append lesson info to results
                results.append({'phase': phase,
                                'topic_name': topic['aria-label'],
                                'lesson_title': lesson.a['aria-label'],
                                'lesson_url': curr_url,
                                'github_repo': repo})
        
        # dump results out to file after each topic, overwriting the file
        with open('results.json', 'w') as f:
            json.dump(results, f)
            f.close()

In [158]:
df = pd.DataFrame(results)
df.head()

Unnamed: 0,phase,topic_name,lesson_title,lesson_url,github_repo
0,phase1,🏆 Online Milestones Instructions,Phase 1 Project Templates and Examples,https://learning.flatironschool.com/courses/26...,
1,phase1,🏆 Milestones,Phase 1 Blog Post,https://learning.flatironschool.com/courses/26...,
2,phase1,Activities & Assignments,Feedback: Assigning CL Recordings Before S.G.,https://learning.flatironschool.com/courses/26...,
3,phase1,Phase 1 - Supplemental Resources & Videos,📊Choosing the Right Visualization,https://learning.flatironschool.com/courses/26...,
4,phase1,📺Phase 1 - Campus Central Lecturer Recordings,Topic 10: Webscraping & HTML/CSS,https://learning.flatironschool.com/courses/26...,


In [157]:
df.to_csv('results.csv')

# Extra

I initially tried this using the requests library, but it proved too difficult to manage the authentication. I did find some neat tricks though.

Create a .json file containing YOUR Flatiron Login creds in the following format:

    {"user[email]": "[Your Email Username]",
     "user[password]": "[Your Password]"}

Update the path to the file to point to that json. Be sure to use the keys as shown above.


In [43]:
f = open(os.path.expanduser('~') + '/.secret/flatiron.json')
creds = json.load(f)

In [95]:
# Create and authenticate into session
# modified from https://stackoverflow.com/questions/50261869/python-requests-422-error-on-post

login_url = 'https://portal.flatironschool.com'

s = requests.Session()

r = s.get(login_url)
soup = BeautifulSoup(r.content, "lxml")

# find other hidden fields that will be needed in addition to un and pw
hidden = soup.find_all("input", {'type':'hidden'})

# get the actual target URL that will be posted to
target = login_url + soup.find("form")['action']

# grab variable names and values from hidden fields
payload = {x["name"]: x["value"] for x in hidden}

#add login creds to the dict
payload.update(creds)

# post creds with hidden info and log in
r = s.post(target, data=payload)
print(r)

<Response [200]>
