In [1]:
!pip install psycopg2



In [1]:
import psycopg2
import pandas as pd

with open('../../src/pw') as pw_file:
    pw = pw_file.readline()

LINK TO DATABASE INFO & SCHEMA: https://github.com/isaac-campbell-smith/Pokestars

In [2]:
def pretty_query(cur, query, conn):
    conn.rollback()
    cur.execute(query)
    data = cur.fetchall() 
    #cur.description stores SQL column information    as a tuple
        #the name method contains SQL column labels
    try:
        headers = [head.name for head in cur.description]
    except:
        headers = [col[0] for col in cur.description]
    out = pd.DataFrame(data=data, columns=headers)
    return out

In [3]:
conn = psycopg2.connect(pw)
cur = conn.cursor()

### OVERARCHING QUESTION
Volcarona is one of my all-time favorite Pokemon - a powerful fire, bug type reminiscent of Mothra with some of the highest base stat totals in the competitively legal Pokedex. Unfortunately in the first 3 generations since it's release it suffered a fate shared by Charizard, Butterfree and Articuno known as 'Stealth Rock Syndrome'. In the most recent generation however, a new item was released that has helped make Volcarona a Top-10 threat in the meta-game. Today we'll be diving into identify what that item is, quantify how much it boosted Volcarona's usage, and look at who else benefited from this new item. 

### Warm-Up Question

Start by writing a query to identify the 11 Pokemon new to the competitive scene this month
<br><br>
Your output table should include all the columns in the `pokemon` table
<br><br>
As a reminder, the id's are sequentially added and the last Pokemon added to the database was Zarude

In [4]:
query = """
;
"""
pretty_query(cur, query, conn)

Unnamed: 0,id,name,type_1,type_2,hp,attack,defense,sp_attack,sp_defense,speed
0,713,Articuno-Galar,Psychic,Flying,90,85,85,125,100,95
1,714,Regidrago,Dragon,,200,100,50,100,50,80
2,715,Moltres-Galar,Dark,Flying,90,85,90,100,125,90
3,716,Spectrier,Ghost,,100,65,60,145,80,130
4,717,Regieleki,Electric,,80,100,50,100,50,200
5,718,Slowking-Galar,Poison,Psychic,95,100,95,100,70,30
6,719,Glastrier,Ice,,100,145,130,65,110,30
7,720,Pheromosa,Bug,Fighting,71,137,37,137,37,151
8,721,Blaziken,Fire,Fighting,80,120,70,110,70,80
9,722,Genesect,Bug,Steel,71,120,95,120,95,99


<details><summary>
Solution:
</summary>
<pre>
SELECT * FROM pokemon 
 WHERE id > (SELECT id FROM pokemon WHERE name='Zarude')
;
</pre>

# Question 1

Let's get into pivot tables. Though there is a PIVOT command in SQL, it's a bit fussy and not as intuitive to code imo. I prefer using CASE WHEN statements and recommend exploring that here.

There are 3 different categories we can extrapolate from this 'new' batch of Pokemon. The 4 Galar formes are the most obvious and are tweaked versions of classic Pokemon. There are also 4 completely brand new Pokemon, the 2 Regi's and 2 Horses (Spectrier and Glastrier). The remaining 3 Pokemon have been around for multiple generations but have been on the ban list at least since 2014 (the start of the database). 

Create a 'pivot table' with these three Pokemon groups and average their October usage stats. Try to use string comparison operators rather than hard coding your lists. 

In [5]:
query = """
;
"""
pretty_query(cur, query, conn)

Unnamed: 0,galarian_formes,unbanned,new_pokemon
0,0.034166,0.096014,0.067805


<details><summary>
Solution:
</summary>
<pre>
SELECT AVG(g) AS galarian_formes, AVG(u) AS unbanned, AVG(n) AS new_pokemon
  FROM
      (
       SELECT CASE 
                  WHEN name LIKE '%Galar' 
              THEN usage END AS g,
              
              CASE 
                  WHEN name NOT LIKE '%Galar' 
                   AND name NOT LIKE 'Regi%' 
                   AND name NOT LIKE '%rier' 
              THEN usage END AS u,
              
              CASE
                  WHEN name LIKE 'Regi%' 
                    OR name LIKE '%rier' 
              THEN usage END AS n

         FROM 
             (
               SELECT p.name, b.usage 
                 FROM battles AS b 
                 JOIN 
                     (
                        SELECT name, id 
                          FROM pokemon 
                         WHERE id > (SELECT id FROM pokemon WHERE name='Zarude')
                     ) AS p
                   ON b.id = p.id
              ) AS t
        ) AS pt
    
;
</pre>

### QUESTION 2
Building off of our ttest from last time, let's use a *very* simple ML model to get a better understanding of the current meta-game<br><br>
First, query all stats from the previous Pokemon group (ignore type; just get hp, attack, etc.) and join the battles usage column. Next, fit a decision tree regressor to the data and plot your tree. Interpret the results. For extra credit, get all Pokemon usage and stats from last month and see how well the model performs.

In [6]:
query = """
;
"""
df = pretty_query(cur, query, conn)
df

Unnamed: 0,hp,attack,defense,sp_attack,sp_defense,speed,usage
0,100,145,130,65,110,30,0.026887
1,100,65,60,145,80,130,0.094396
2,71,137,37,137,37,151,0.082258
3,80,100,50,100,50,200,0.123285
4,80,120,70,110,70,80,0.087145
5,71,120,95,120,95,99,0.118639
6,90,125,90,85,90,100,0.085417
7,200,100,50,100,50,80,0.02665
8,95,100,95,100,70,30,0.007119
9,90,85,85,125,100,95,0.020636


<details><summary>
Solution:
</summary>
<pre>
SELECT  hp, attack, defense, sp_attack, sp_defense, speed, usage
  FROM battles AS b 
  JOIN 
       (
        SELECT *
          FROM pokemon 
         WHERE id > (SELECT id FROM pokemon WHERE name='Zarude')
       ) AS p
    ON b.id = p.id
    
;
</pre>

In [7]:
from sklearn import tree
import matplotlib.pyplot as plt

In [8]:
#SKLEARN ANSWER HERE

<details><summary>
Solution:
</summary>
<pre>
y = df.pop('usage')
dt = tree.DecisionTreeRegressor()

dt.fit(df, y)

fig, ax = plt.subplots(figsize=(14, 14))
tree.plot_tree(dt, ax=ax, feature_names=df.columns);
</pre>

# RUN THIS CELL WHEN YOU'RE DONE OR ELSE I WILL FIND YOU AND HURT YOU

In [9]:
cur.close()  # Close the cursor
conn.close()