# Day 14
Today builds on Day 13 by adding extra data to the table: current week of the season and the date/time that the tier list was last updated. These two data elements are actually in a picture saved on the Boris Chen website so I'll need to extract the text from the .PNG, do some cleaning, and add it to the table.

I ended up using the [pytesseract library](https://pypi.org/project/pytesseract/) which is an optical character recognition (OCR) tool for python. It recognizes and “reads” the text embedded in images. Python-tesseract is a wrapper for Google’s [Tesseract-OCR Engine](https://github.com/tesseract-ocr/tesseract).

## Set Up

In [1]:
from bs4 import BeautifulSoup as Soup
from PIL import Image
import pandas as pd
import requests
import dataframe_image as dfi
import pytesseract
import shutil
import os

## Tier List Table

In [2]:
# Scrape
url = f"http://www.borischen.co/p/half-05-5-ppr-running-back-tier-rankings.html"
response = requests.get(url)

# Parse
soup = Soup(requests.get(url).content, "html.parser")

In [3]:
# Look for where the table is stored
soup.find("object")

<object data="https://s3-us-west-1.amazonaws.com/fftiers/out/text_RB-HALF.txt" style="height: 100%; margin: 1%; width: 100%;" type="text/html"></object>

In [4]:
# Get url from data tag
url = soup.find("object")['data']
url

'https://s3-us-west-1.amazonaws.com/fftiers/out/text_RB-HALF.txt'

## OCR Testing

In [5]:
url = 'https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-WR-HALF.png'
img_name = url.split("/")[-1]
response = requests.get(url, stream=True)
with open(img_name, 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

In [6]:
pic = 'https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-WR-HALF.png'

text = pytesseract.image_to_string(Image.open('./weekly-WR-HALF.png'))

print(text)

Expert Consensus Rank

-20 -

-40-

-60-

Week 13 - WR-HALF Tiers - Sat Dec 03 2022 09:10 PST

Davante Adams ©
Tyreek Hill -
Justin Jefferson -
Stefon Diggs -@-
AJ. Brown =@=
CeeDee Lamb ~-
Amon-Ra St. Brown -®
Jaylen Waddie ~e=
Tee Higgins -®
Ja'Marr Chase. ——e—=
Amari Cooper. —®—=
Chris Godwin —e=
Christian Kirk = —e—
DK Metcalf ~e-
Keenan Allen =
Garrett Wilson ——
Mike Evans —e—
Terry McLaurin ===
Tyler Lockett ~e=
Deebo Samuel
Chris Olave —e—
Brandon Aiyuk —e—
DeVonta Smith —e—
Christian Watson —=@==
Michael Pittman Jr. -@=
JuJu Smith-Schuster —§ ———=
Courtland Sutton = —e=
Joshua Palmer ——e—
Gabe Davis —e—
George Pickens = ——®—=
Allen Lazard = —e—
Zay Jones. —@—
Jakobi Meyers. —@—
Diontae Johnson ==
Treylon Burks —e—
Darius Slayton —o=
‘Adam Thielen —@-~
Donovan Peoples-Jones = —@—
Tyler Boyd —e—
Drake London ===
Nico Collins = —e—
Michael Gallup = ——
Mack Hollins —=-®—
Curtis Samuel ——e—
Parris Campbell ——
Marquez Valdes-Scantling  —-®=
Isaiah McKenzie = —®—~
DeAndre Carter —————

In [7]:
text.split("\n")

week_time = ''

for i in text.split("\n"):
    if i[:4] == 'Week':
        week_time = i

print(week_time.split(" - "))

week = week_time.split(" - ")[0]
time = week_time.split(" - ")[2]

print(week)
print(time)

# Remove image after use
os.remove(img_name)


['Week 13', 'WR-HALF Tiers', 'Sat Dec 03 2022 09:10 PST']
Week 13
Sat Dec 03 2022 09:10 PST


## Extracting the Data

In [8]:
def get_player_tiers(position, scoring):

    """
    position: 'RB', 'WR', 'QB', or 'FLX
    scoring: 
        "STAN": standard
        "HALF": half-ppr
        "PPR: ppr
    """

    # Build URL for tier table and image
    if position == "QB":
        url_tiers = f"https://s3-us-west-1.amazonaws.com/fftiers/out/text_{position}.txt"
        url_img = f"https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-{position}.png"
    else:
        if scoring == "STAN":
            url_tiers = f"https://s3-us-west-1.amazonaws.com/fftiers/out/text_{position}.txt"
            url_img = f"https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-{position}.png"
        else:   
            url_tiers = f"https://s3-us-west-1.amazonaws.com/fftiers/out/text_{position}-{scoring}.txt"
            url_img = f"https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-{position}-{scoring}.png"
    
    # Get tier table with player info
    table = requests.get(url_tiers).text
    
    # Get image with week and update time
    img_name = url_img.split("/")[-1]
    tier_img = requests.get(url_img, stream=True)
    with open(img_name, 'wb') as out_file:
        shutil.copyfileobj(tier_img.raw, out_file)
    del tier_img

    # Clean up tier table
    temp = [x.strip() for x in table.replace("\n",",").split(",")]
    
    # Get week and time from image
    text = pytesseract.image_to_string(Image.open(img_name))
    
    text.split("\n")

    week_time = ''

    for i in text.split("\n"):
        if i[:4] == 'Week':
            week_time = i

    week = week_time.split(" - ")[0]
    time = week_time.split(" - ")[2]

    # Remove image after use
    os.remove(img_name)

    # Get data into containers for saving into a DataFrame
    data = {}
    player_names = []
    tiers = []

    current_tier = 1

    for i in temp[:-1]:
        if i[:4] == "Tier":
            
            current_tier = int(i.split(":")[0].split(" ")[1])

            player_names.append(i.split(":")[1].strip())
            tiers.append(current_tier)
        else:
            player_names.append(i)
            tiers.append(current_tier)

    data['player_name'] = player_names
    data['position'] = [position for i in list(range(1,len(player_names)+1))]
    data['scoring'] = [scoring for i in list(range(1,len(player_names)+1))]
    data['week'] = [week for i in list(range(1,len(player_names)+1))]
    data['tier'] = tiers
    data['updated'] = [time for i in list(range(1,len(player_names)+1))]
    
    return pd.DataFrame(data)

def get_my_players(ds, player_list, scoring_list):
    
    f_players = ds['player_name'].isin(player_list)
    f_scoring = ds['scoring'].isin(scoring_list)

    # Clean up index for easier legibility
    _ = ds[f_players & f_scoring].sort_values(['position', 'scoring', 'tier'], ascending=[False, True, True])
    _.index = _.index + 1
    
    return _

In [9]:
get_player_tiers('RB', 'HALF').head()

Unnamed: 0,player_name,position,scoring,week,tier,updated
0,Austin Ekeler,RB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
1,Josh Jacobs,RB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
2,Nick Chubb,RB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
3,Derrick Henry,RB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
4,Rhamondre Stevenson,RB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST


## Get player rankings

Imagine you are playing .5 PPR and want to know where your QBs, RBs, WRs, TE's and FLEX players stand. You can do the following:

In [10]:
# Positions you have on your team
positions = ['QB', 'RB', 'WR', 'FLX', 'TE']

# League scoring
# Add more if you are in multiple leagues with different scoring systems
scoring_systems = ['HALF']

# Get data
datasets = []

for scoring in scoring_systems:
    for pos in positions:
        datasets.append(get_player_tiers(pos, scoring))
    

all_players = pd.concat(datasets)
all_players.head()

Unnamed: 0,player_name,position,scoring,week,tier,updated
0,Patrick Mahomes II,QB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
1,Jalen Hurts,QB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
2,Josh Allen,QB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
3,Joe Burrow,QB,HALF,Week 13,2,Sat Dec 03 2022 09:10 PST
4,Justin Herbert,QB,HALF,Week 13,2,Sat Dec 03 2022 09:10 PST


### Only your players

In [13]:
my_players = [
        'Josh Allen', 
        'Chris Godwin', 
        'Joshua Palmer',
        'CeeDee Lamb', 
        'Josh Jacobs', 
        'Khalil Herbert', 
        'Michael Pittman Jr.', 
        'Justin Herbert',
        'Tony Pollard', 
        'T.J. Hockenson',
        'Nick Chubb',
        'Gus Edwards']

# Index will be the overall rank per position, scoring system
my_player_rankings = get_my_players(all_players, my_players, ['HALF'])
my_player_rankings

Unnamed: 0,player_name,position,scoring,week,tier,updated
6,CeeDee Lamb,WR,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
12,Chris Godwin,WR,HALF,Week 13,2,Sat Dec 03 2022 09:10 PST
25,Michael Pittman Jr.,WR,HALF,Week 13,3,Sat Dec 03 2022 09:10 PST
28,Joshua Palmer,WR,HALF,Week 13,4,Sat Dec 03 2022 09:10 PST
3,T.J. Hockenson,TE,HALF,Week 13,2,Sat Dec 03 2022 09:10 PST
2,Josh Jacobs,RB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
3,Nick Chubb,RB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST
15,Tony Pollard,RB,HALF,Week 13,4,Sat Dec 03 2022 09:10 PST
28,Gus Edwards,RB,HALF,Week 13,7,Sat Dec 03 2022 09:10 PST
3,Josh Allen,QB,HALF,Week 13,1,Sat Dec 03 2022 09:10 PST


In [12]:
# Save table for Twitter post
my_player_rankings.dfi.export('../twitter/day14_table.png')

objc[16210]: Class WebSwapCGLLayer is implemented in both /System/Library/Frameworks/WebKit.framework/Versions/A/Frameworks/WebCore.framework/Versions/A/Frameworks/libANGLE-shared.dylib (0x22fd6f2e0) and /Applications/Google Chrome.app/Contents/Frameworks/Google Chrome Framework.framework/Versions/108.0.5359.94/Libraries/libGLESv2.dylib (0x10bc83f38). One of the two will be used. Which one is undefined.
[1204/104942.124396:INFO:headless_shell.cc(623)] Written to file /var/folders/pr/phs5jp1d143fx1t05hqzwt580000gn/T/tmpwhhjc3he/temp.png.
