# Day 14
Today builds on Day 13 by adding extra data to the table: current week of the season and the date/time that the tier list was last updated. These two data elements are actually in a picture saved on the Boris Chen website so I'll need to extract the text from the .PNG, do some cleaning, and add it to the table.

I ended up using the [pytesseract library](https://pypi.org/project/pytesseract/) which is an optical character recognition (OCR) tool for python. It recognizes and “reads” the text embedded in images. Python-tesseract is a wrapper for Google’s [Tesseract-OCR Engine](https://github.com/tesseract-ocr/tesseract).

## Set Up

In [40]:
from bs4 import BeautifulSoup as Soup
from PIL import Image
import pandas as pd
import requests
import dataframe_image as dfi
import pytesseract
import shutil
import os

## Tier List Table

In [41]:
# Scrape
url = f"http://www.borischen.co/p/half-05-5-ppr-running-back-tier-rankings.html"
response = requests.get(url)

# Parse
soup = Soup(requests.get(url).content, "html.parser")

In [42]:
# Look for where the table is stored
soup.find("object")

<object data="https://s3-us-west-1.amazonaws.com/fftiers/out/text_RB-HALF.txt" style="height: 100%; margin: 1%; width: 100%;" type="text/html"></object>

In [43]:
# Get url from data tag
url = soup.find("object")['data']
url

'https://s3-us-west-1.amazonaws.com/fftiers/out/text_RB-HALF.txt'

## OCR Testing

In [44]:
url = 'https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-WR-HALF.png'
img_name = url.split("/")[-1]
response = requests.get(url, stream=True)
with open(img_name, 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

In [45]:
pic = 'https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-WR-HALF.png'

text = pytesseract.image_to_string(Image.open('./weekly-WR-HALF.png'))

print(text)

Expert Consensus Rank

-20 -

-40-

-60-

Week 9 - WR-HALF Tiers - Sun Nov 06 2022 10:23 PST

Tyreek Hill -@
Justin Jefferson -@
Stefon Diggs -e-
Cooper Kupp ~®
DeAndre Hopkins -®=
Davante Adams -®
AJ. Brown -®
Tee Higgins -@
Jaylen Waddle ~®=
Amon-Ra St. Brown ==
Mike Evans -@
Chris Godwin =
Chris Olave —e—
DJ Moore ~e—
DK Metcalf —°—
Tyler Lockett ~e=
Terry McLaurin —@=
Gabe Davis —e—
Tyler Boyd ~~
Christian Kirk =@=
JuJu Smith-Schuster — —=®=
DeVonta Smith —e=
Jakobi Meyers ~@=
Michael Pittman Jr. —=@—=
Joshua Palmer ——e—
‘Adam Thielen -@=
Curtis Samuel —e—
Romeo Doubs ——
Rondale Moore —®—
Darnell Mooney ==
Garrett Wilson —e—
Allen Lazard = ———e——
Devin Duvernay  —e—
Drake London —=®—=

Allen Robinson || = —e—
Zay Jones —@—=
Marquez Valdes-Scanting = ==

Robert Woods -——
Kalif Raymond -——————=
Mack Hollins —=®=
Hunter Renfrow = ——®—=
DeAndre Carter)
Alec Pierce —®=
Mecole Hardman ——
Terrace Marshall Jr. ——=
Marvin Jones Jr. ==
Isaiah McKenzie = -————
Parris Campbell —e—~

Tier

Te T

In [46]:
text.split("\n")

week_time = ''

for i in text.split("\n"):
    if i[:4] == 'Week':
        week_time = i

print(week_time.split(" - "))

week = week_time.split(" - ")[0]
time = week_time.split(" - ")[2]

print(week)
print(time)

# Remove image after use
os.remove(img_name)


['Week 9', 'WR-HALF Tiers', 'Sun Nov 06 2022 10:23 PST']
Week 9
Sun Nov 06 2022 10:23 PST


## Extracting the Data

In [47]:
def get_player_tiers(position, scoring):

    """
    position: 'RB', 'WR', 'QB', or 'FLX
    scoring: 
        "STAN": standard
        "HALF": half-ppr
        "PPR: ppr
    """

    # Build URL for tier table and image
    if position == "QB":
        url_tiers = f"https://s3-us-west-1.amazonaws.com/fftiers/out/text_{position}.txt"
        url_img = f"https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-{position}.png"
    else:
        if scoring == "STAN":
            url_tiers = f"https://s3-us-west-1.amazonaws.com/fftiers/out/text_{position}.txt"
            url_img = f"https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-{position}.png"
        else:   
            url_tiers = f"https://s3-us-west-1.amazonaws.com/fftiers/out/text_{position}-{scoring}.txt"
            url_img = f"https://s3-us-west-1.amazonaws.com/fftiers/out/weekly-{position}-{scoring}.png"
    
    # Get tier table with player info
    table = requests.get(url_tiers).text
    
    # Get image with week and update time
    img_name = url_img.split("/")[-1]
    tier_img = requests.get(url_img, stream=True)
    with open(img_name, 'wb') as out_file:
        shutil.copyfileobj(tier_img.raw, out_file)
    del tier_img

    # Clean up tier table
    temp = [x.strip() for x in table.replace("\n",",").split(",")]
    
    # Get week and time from image
    text = pytesseract.image_to_string(Image.open(img_name))
    
    text.split("\n")

    week_time = ''

    for i in text.split("\n"):
        if i[:4] == 'Week':
            week_time = i

    week = week_time.split(" - ")[0]
    time = week_time.split(" - ")[2]

    # Remove image after use
    os.remove(img_name)

    # Get data into containers for saving into a DataFrame
    data = {}
    player_names = []
    tiers = []

    current_tier = 1

    for i in temp[:-1]:
        if i[:4] == "Tier":
            
            current_tier = int(i.split(":")[0].split(" ")[1])

            player_names.append(i.split(":")[1].strip())
            tiers.append(current_tier)
        else:
            player_names.append(i)
            tiers.append(current_tier)

    data['player_name'] = player_names
    data['position'] = [position for i in list(range(1,len(player_names)+1))]
    data['scoring'] = [scoring for i in list(range(1,len(player_names)+1))]
    data['week'] = [week for i in list(range(1,len(player_names)+1))]
    data['tier'] = tiers
    data['updated'] = [time for i in list(range(1,len(player_names)+1))]
    
    return pd.DataFrame(data)

def get_my_players(ds, player_list, scoring_list):
    
    f_players = ds['player_name'].isin(player_list)
    f_scoring = ds['scoring'].isin(scoring_list)

    # Clean up index for easier legibility
    _ = ds[f_players & f_scoring].sort_values(['position', 'scoring', 'tier'], ascending=[False, True, True])
    _.index = _.index + 1
    
    return _

In [48]:
get_player_tiers('RB', 'HALF').head()

Unnamed: 0,player_name,position,scoring,week,tier,updated
0,Austin Ekeler,RB,HALF,Week 9,1,Sun Nov 06 2022 10:23 PST
1,Alvin Kamara,RB,HALF,Week 9,1,Sun Nov 06 2022 10:23 PST
2,Derrick Henry,RB,HALF,Week 9,1,Sun Nov 06 2022 10:23 PST
3,Josh Jacobs,RB,HALF,Week 9,1,Sun Nov 06 2022 10:23 PST
4,Aaron Jones,RB,HALF,Week 9,2,Sun Nov 06 2022 10:23 PST


## Get player rankings

Imagine you are playing .5 PPR and want to know where your QBs, RBs, WRs, TE's and FLEX players stand. You can do the following:

In [49]:
# Positions you have on your team
positions = ['QB', 'RB', 'WR', 'FLX', 'TE']

# League scoring
# Add more if you are in multiple leagues with different scoring systems
scoring_systems = ['HALF']

# Get data
datasets = []

for scoring in scoring_systems:
    for pos in positions:
        datasets.append(get_player_tiers(pos, scoring))
    

all_players = pd.concat(datasets)
all_players.head()

Unnamed: 0,player_name,position,scoring,week,tier,updated
0,Josh Allen,QB,HALF,Week 9,1,Sun Nov 06 2022 10:23 PST
1,Jalen Hurts,QB,HALF,Week 9,1,Sun Nov 06 2022 10:23 PST
2,Patrick Mahomes II,QB,HALF,Week 9,1,Sun Nov 06 2022 10:23 PST
3,Lamar Jackson,QB,HALF,Week 9,2,Sun Nov 06 2022 10:23 PST
4,Kyler Murray,QB,HALF,Week 9,2,Sun Nov 06 2022 10:23 PST


### Only your players

In [50]:
my_players = [
        'Josh Allen', 
        'Chris Godwin', 
        'Josh Palmer', 
        'Josh Jacobs', 
        'Khalil Herbert', 
        'Taysom Hill', 
        'Michael Pittman Jr.', 
        'Justin Herbert', 
        'T.J. Hockenson']

# Index will be the overall rank per position, scoring system
my_player_rankings = get_my_players(all_players, my_players, ['HALF'])
my_player_rankings

Unnamed: 0,player_name,position,scoring,week,tier,updated
12,Chris Godwin,WR,HALF,Week 9,3,Sun Nov 06 2022 10:23 PST
24,Michael Pittman Jr.,WR,HALF,Week 9,4,Sun Nov 06 2022 10:23 PST
6,Taysom Hill,TE,HALF,Week 9,2,Sun Nov 06 2022 10:23 PST
12,T.J. Hockenson,TE,HALF,Week 9,3,Sun Nov 06 2022 10:23 PST
4,Josh Jacobs,RB,HALF,Week 9,1,Sun Nov 06 2022 10:23 PST
24,Khalil Herbert,RB,HALF,Week 9,6,Sun Nov 06 2022 10:23 PST
1,Josh Allen,QB,HALF,Week 9,1,Sun Nov 06 2022 10:23 PST
6,Justin Herbert,QB,HALF,Week 9,2,Sun Nov 06 2022 10:23 PST
8,Chris Godwin,FLX,HALF,Week 9,2,Sun Nov 06 2022 10:23 PST
22,Michael Pittman Jr.,FLX,HALF,Week 9,5,Sun Nov 06 2022 10:23 PST


In [51]:
# Save table for Twitter post
my_player_rankings.dfi.export('../twitter/day14_table.png')

objc[28281]: Class WebSwapCGLLayer is implemented in both /System/Library/Frameworks/WebKit.framework/Versions/A/Frameworks/WebCore.framework/Versions/A/Frameworks/libANGLE-shared.dylib (0x23c0972e0) and /Applications/Google Chrome.app/Contents/Frameworks/Google Chrome Framework.framework/Versions/107.0.5304.87/Libraries/libGLESv2.dylib (0x10c67c0d8). One of the two will be used. Which one is undefined.
[1107/221742.907125:INFO:headless_shell.cc(657)] Written to file /var/folders/pr/phs5jp1d143fx1t05hqzwt580000gn/T/tmpv3_37rtt/temp.png.
