# Using Retrosheet and PySpark to Solve an Immaculate Grid Puzzle

### In this Notebook I will be solving an immaculate grid puzzle hosted on [Immaculate Grid Baseball](https://www.immaculategrid.com/)
### I will be using Play-By-Play Game Data sourced from [Retrosheet](https://www.retrosheet.org/) from 1912-2024. This results in a large dataset (15M records). So, I will use PySpark for effectively querying this large dataset so that I can solve the immaculate grid puzzles

In [1]:
from pyspark.sql import SparkSession
from ucimlrepo import fetch_ucirepo
from pyspark.sql import functions as F
from pyspark.sql import Window
import matplotlib.pyplot as plt
import seaborn as sns
import os
print(f'{os.getcwd()}')
import pandas as pd

/home/jdwolfe/Spark-Projects/spark-learning-notebooks


## Initializing the Spark Session (Running in Docker)

In [16]:
spark = SparkSession.builder.appName("Jupyter").getOrCreate()

spark

## Read the decade Play-by-Play Files Can be found here: [Retrosheet Decade Zip Files](https://www.retrosheet.org/game.htm) [How to extract Event files](https://www.retrosheet.org/datause.html)

In [17]:
csv_headers = [
    "game id", "visiting team", "inning", "batting team", "outs", "balls", "strikes",
    "pitch sequence", "vis score", "home score", "batter", "batter hand", "res batter",
    "res batter hand", "pitcher", "pitcher hand", "res pitcher", "res pitcher hand",
    "catcher", "first base", "second base", "third base", "shortstop", "left field",
    "center field", "right field", "first runner", "second runner", "third runner",
    "event text", "leadoff flag", "pinchhit flag", "defensive position", "lineup position",
    "event type", "batter event flag", "ab flag", "hit value", "SH flag", "SF flag",
    "outs on play", "double play flag", "triple play flag", "RBI on play", "wild pitch flag",
    "passed ball flag", "fielded by", "batted ball type", "bunt flag", "foul flag",
    "hit location", "num errors", "1st error player", "1st error type", "2nd error player",
    "2nd error type", "3rd error player", "3rd error type", "batter dest",
    "runner on 1st dest", "runner on 2nd dest", "runner on 3rd dest", "play on batter",
    "play on runner on 1st", "play on runner on 2nd", "play on runner on 3rd",
    "SB for runner on 1st flag", "SB for runner on 2nd flag", "SB for runner on 3rd flag",
    "CS for runner on 1st flag", "CS for runner on 2nd flag", "CS for runner on 3rd flag",
    "PO for runner on 1st flag", "PO for runner on 2nd flag", "PO for runner on 3rd flag",
    "Responsible pitcher for runner on 1st", "Responsible pitcher for runner on 2nd",
    "Responsible pitcher for runner on 3rd", "New Game Flag", "End Game Flag",
    "Pinch-runner on 1st", "Pinch-runner on 2nd", "Pinch-runner on 3rd",
    "Runner removed for pinch-runner on 1st", "Runner removed for pinch-runner on 2nd",
    "Runner removed for pinch-runner on 3rd", "Batter removed for pinch-hitter",
    "Position of batter removed for pinch-hitter", "Fielder with First Putout",
    "Fielder with Second Putout", "Fielder with Third Putout", "Fielder with First Assist",
    "Fielder with Second Assist", "Fielder with Third Assist", "Fielder with Fourth Assist",
    "Fielder with Fifth Assist", "event num"
]

full_df = None

for file in os.listdir(f'{os.getcwd()}/data/mlb_data/decade_pbp_files'):
    file_path = f'{os.getcwd()}/data/mlb_data/decade_pbp_files/{file}'
    print(f'Processing {file_path}')
    df = spark.read.csv(file_path, header=False, inferSchema=True).toDF(*csv_headers)

    if full_df is None:
        full_df = df
    else:
        full_df = full_df.unionByName(df)

full_df.show()


Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_1910s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_1970s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_1930s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_2000s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_1960s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_1950s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_1940s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_1920s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_1980s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_1990s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_2020s_pbp.csv


                                                                                

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files/full_2010s_pbp.csv


                                                                                

+------------+-------------+------+------------+----+-----+-------+--------------+---------+----------+--------+-----------+----------+---------------+--------+------------+-----------+----------------+--------+----------+-----------+----------+---------+----------+------------+-----------+------------+-------------+------------+---------------+------------+-------------+------------------+---------------+----------+-----------------+-------+---------+-------+-------+------------+----------------+----------------+-----------+---------------+----------------+----------+----------------+---------+---------+------------+----------+----------------+--------------+----------------+--------------+----------------+--------------+-----------+------------------+------------------+------------------+--------------+---------------------+---------------------+---------------------+-------------------------+-------------------------+-------------------------+-------------------------+--------------

In [4]:
full_df.printSchema()

root
 |-- game id: string (nullable = true)
 |-- visiting team: string (nullable = true)
 |-- inning: integer (nullable = true)
 |-- batting team: integer (nullable = true)
 |-- outs: integer (nullable = true)
 |-- balls: integer (nullable = true)
 |-- strikes: integer (nullable = true)
 |-- pitch sequence: string (nullable = true)
 |-- vis score: integer (nullable = true)
 |-- home score: integer (nullable = true)
 |-- batter: string (nullable = true)
 |-- batter hand: string (nullable = true)
 |-- res batter: string (nullable = true)
 |-- res batter hand: string (nullable = true)
 |-- pitcher: string (nullable = true)
 |-- pitcher hand: string (nullable = true)
 |-- res pitcher: string (nullable = true)
 |-- res pitcher hand: string (nullable = true)
 |-- catcher: string (nullable = true)
 |-- first base: string (nullable = true)
 |-- second base: string (nullable = true)
 |-- third base: string (nullable = true)
 |-- shortstop: string (nullable = true)
 |-- left field: string (nulla

In [5]:
full_df.count()

                                                                                

15217681

In [6]:
all_teams = full_df \
        .select('visiting team') \
        .distinct()


all_teams.show(all_teams.count())



+-------------+
|visiting team|
+-------------+
|          SLN|
|          NY1|
|          BRO|
|          CIN|
|          PHI|
|          BSN|
|          PIT|
|          CHN|
|          PHA|
|          NYA|
|          DET|
|          SLA|
|          WS1|
|          CHA|
|          BOS|
|          CLE|
|          SFN|
|          LAN|
|          SDN|
|          MON|
|          ATL|
|          HOU|
|          NYN|
|          OAK|
|          BAL|
|          CAL|
|          MIN|
|          WS2|
|          KCA|
|          MIL|
|            @|
|          TBA|
|          FLO|
|          ANA|
|          TOR|
|          ARI|
|          SEA|
|          COL|
|          TEX|
|          MLN|
|          KC1|
|          MIA|
|          WAS|
+-------------+



                                                                                

## Create new columns for Season and the Home Team

In [6]:
full_df = full_df.withColumn('Season', F.substring(F.col('game id'), 4, 4).cast('int'))

full_df = full_df.withColumn('home team', F.substring(F.col('game id'), 1, 3))

full_df.show()

+------------+-------------+------+------------+----+-----+-------+--------------+---------+----------+--------+-----------+----------+---------------+--------+------------+-----------+----------------+--------+----------+-----------+----------+---------+----------+------------+-----------+------------+-------------+------------+---------------+------------+-------------+------------------+---------------+----------+-----------------+-------+---------+-------+-------+------------+----------------+----------------+-----------+---------------+----------------+----------+----------------+---------+---------+------------+----------+----------------+--------------+----------------+--------------+----------------+--------------+-----------+------------------+------------------+------------------+--------------+---------------------+---------------------+---------------------+-------------------------+-------------------------+-------------------------+-------------------------+--------------

In [7]:
full_df.select(
    F.min(
        F.col('Season')
    ).alias('Min Season of Data'),

    F.max(
        F.col('Season')
    ).alias('Max Season of Data'),
).distinct().show()



+------------------+------------------+
|Min Season of Data|Max Season of Data|
+------------------+------------------+
|              1912|              2024|
+------------------+------------------+



                                                                                

## Bring in the Biofile so we can get player names

In [8]:

file_path = f'{os.getcwd()}/data/mlb_data/biofile.csv'
bio_df = spark.read.csv(file_path, header=True, inferSchema=True)

bio_df.show()

+--------+-----------+----------------+--------+----------+------------+-------------+------------------+----------+-------------+---------+------------+-----------+--------------+----------+------------+----------+---------------+-----------+-------------+----+------+------+------+--------------------+---------------+------------+------------+---------+--------------------+--------+-------+----+
|PLAYERID|       LAST|           FIRST|NICKNAME| BIRTHDATE|  BIRTH.CITY|  BIRTH.STATE|     BIRTH.COUNTRY|PLAY.DEBUT|PLAY.LASTGAME|MGR.DEBUT|MGR.LASTGAME|COACH.DEBUT|COACH.LASTGAME| UMP.DEBUT|UMP.LASTGAME| DEATHDATE|     DEATH.CITY|DEATH.STATE|DEATH.COUNTRY|BATS|THROWS|HEIGHT|WEIGHT|            CEMETERY|      CEME.CITY|  CEME.STATE|CEME.COUNTRY|CEME.NOTE|          BIRTH.NAME|NAME.CHG|BAT.CHG| HOF|
+--------+-----------+----------------+--------+----------+------------+-------------+------------------+----------+-------------+---------+------------+-----------+--------------+----------+---------

In [9]:
bio_df.printSchema()

root
 |-- PLAYERID: string (nullable = true)
 |-- LAST: string (nullable = true)
 |-- FIRST: string (nullable = true)
 |-- NICKNAME: string (nullable = true)
 |-- BIRTHDATE: string (nullable = true)
 |-- BIRTH.CITY: string (nullable = true)
 |-- BIRTH.STATE: string (nullable = true)
 |-- BIRTH.COUNTRY: string (nullable = true)
 |-- PLAY.DEBUT: string (nullable = true)
 |-- PLAY.LASTGAME: string (nullable = true)
 |-- MGR.DEBUT: string (nullable = true)
 |-- MGR.LASTGAME: string (nullable = true)
 |-- COACH.DEBUT: string (nullable = true)
 |-- COACH.LASTGAME: string (nullable = true)
 |-- UMP.DEBUT: string (nullable = true)
 |-- UMP.LASTGAME: string (nullable = true)
 |-- DEATHDATE: string (nullable = true)
 |-- DEATH.CITY: string (nullable = true)
 |-- DEATH.STATE: string (nullable = true)
 |-- DEATH.COUNTRY: string (nullable = true)
 |-- BATS: string (nullable = true)
 |-- THROWS: string (nullable = true)
 |-- HEIGHT: string (nullable = true)
 |-- WEIGHT: double (nullable = true)
 |--

## I ran this to test how different it is to try to put all the data into a pandas dataframe and it took over 15min and it was still not done loading, so skip this step

In [10]:
pandas_full = pd.DataFrame()
for file in os.listdir(f'{os.getcwd()}/data/mlb_data'):
    file_path = f'{os.getcwd()}/data/mlb_data/{file}'
    print(f'Processing {file_path}')
    df = pd.read_csv(file_path)

    pandas_full = pd.concat([pandas_full, df])


print(pandas_full.head())

Processing /home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files


IsADirectoryError: [Errno 21] Is a directory: '/home/jdwolfe/Spark-Projects/spark-learning-notebooks/data/mlb_data/decade_pbp_files'

### ^^ Pandas was taking almost 10 min here to process similar data to pyspark it didnt finish so I skipped it

## Query Practice!

### All one inning cycle pitching leaders

In [11]:
pitcher_inning_cycles = full_df.groupby('pitcher', 'game id', 'inning').agg(
    F.sum(F.when(F.col('event type') == 20, 1).otherwise(0)).alias('singles'),
    F.sum(F.when(F.col('event type') == 21, 1).otherwise(0)).alias('doubles'),
    F.sum(F.when(F.col('event type') == 22, 1).otherwise(0)).alias('triples'),
    F.sum(F.when(F.col('event type') == 23, 1).otherwise(0)).alias('home runs')
)


bio_select   = bio_df.select('PLAYERID', 'NICKNAME', 'LAST')


pitcher_inning_cycles = pitcher_inning_cycles \
    .filter(
        (F.col('singles') > 0) &
        (F.col('doubles') > 0) &
        (F.col('triples') > 0) &
        (F.col('home runs') > 0)
    ) \
    .groupby('pitcher') \
    .agg(
    F.count('pitcher').alias('count')
)



pitcher_inning_cycles = pitcher_inning_cycles \
                        .join(bio_select, bio_select.PLAYERID == pitcher_inning_cycles.pitcher, 'left') \
                        .drop('PLAYERID', 'pitcher') \
                        .sort('count', ascending=False )  \
                        .show()





25/05/12 20:37:45 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:37:45 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:37:45 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:37:45 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:37:45 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:37:45 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:37:45 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:37:45 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:37:45 WARN RowBasedKeyValueBatch: Calling spill() on

+-----+--------+----------+
|count|NICKNAME|      LAST|
+-----+--------+----------+
|    6|   Edwin|   Jackson|
|    4|  Nelson|    Briles|
|    4|   Lance|      Lynn|
|    3|   Kevin|  Millwood|
|    3|     Bob|    Forsch|
|    3|   Jesse|    Haines|
|    3|    Kyle|    Davies|
|    3|     Don|  Cardwell|
|    3|    Bill|      Doak|
|    3|  Felipe|   Paulino|
|    3|     Red|     Lucas|
|    3|     Bob|    Friend|
|    3|    Carl|   Hubbell|
|    3|     Jim|     Owens|
|    3|     Sid|    Hudson|
|    3|      Ed|    Brandt|
|    3|    Milt|    Pappas|
|    3|    Bill|Gullickson|
|    3|    Bill|    Wegman|
|    3|     Red|   Ruffing|
+-----+--------+----------+
only showing top 20 rows



### Cycles by Players Leaders

In [12]:
bio_select   = bio_df.select('PLAYERID', 'NICKNAME', 'LAST')

cycle_batters = full_df.groupby('batter', 'game id').agg(
    F.sum(F.when(F.col('event type') == 20, 1).otherwise(0)).alias('singles'),
    F.sum(F.when(F.col('event type') == 21, 1).otherwise(0)).alias('doubles'),
    F.sum(F.when(F.col('event type') == 22, 1).otherwise(0)).alias('triples'),
    F.sum(F.when(F.col('event type') == 23, 1).otherwise(0)).alias('home runs')
    ) \
    .filter(

        (F.col('singles') > 0) &
        (F.col('doubles') > 0) &
        (F.col('triples') > 0) &
        (F.col('home runs') > 0)

    ).groupby('batter') \
    .agg(
    F.count('batter').alias('count')
)

cycle_batters = cycle_batters \
                .join(bio_select, bio_select.PLAYERID == cycle_batters.batter, 'left') \
                .sort('count', ascending=False ) .show()



25/05/12 20:38:13 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:38:13 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:38:13 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:38:13 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:38:13 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:38:13 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:38:13 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:38:13 WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
25/05/12 20:38:13 WARN RowBasedKeyValueBatch: Calling spill() on

+--------+-----+--------+---------+---------+
|  batter|count|PLAYERID| NICKNAME|     LAST|
+--------+-----+--------+---------+---------+
|meusb101|    3|meusb101|      Bob|   Meusel|
|yelic001|    3|yelic001|Christian|   Yelich|
|belta001|    3|belta001|   Adrian|   Beltre|
|hermb102|    3|hermb102|     Babe|   Herman|
|turnt001|    3|turnt001|     Trea|   Turner|
|watsb001|    2|watsb001|      Bob|   Watson|
|gehrl101|    2|gehrl101|      Lou|   Gehrig|
|olerj001|    2|olerj001|     John|   Olerud|
|speic001|    2|speic001|    Chris|   Speier|
|westw102|    2|westw102|    Wally| Westlake|
|freef001|    2|freef001|  Freddie|  Freeman|
|arenn001|    2|arenn001|    Nolan|  Arenado|
|wilkb002|    2|wilkb002|     Brad|Wilkerson|
|boyek101|    2|boyek101|      Ken|    Boyer|
|fregj101|    2|fregj101|      Jim|  Fregosi|
|kleic101|    2|kleic101|    Chuck|    Klein|
|gomec002|    2|gomec002|   Carlos|    Gomez|
|hilla001|    2|hilla001|    Aaron|     Hill|
|cuddm001|    2|cuddm001|  Michael

### Hit Leaders

In [13]:
bio_select   = bio_df.select('PLAYERID', 'NICKNAME', 'LAST')


hit_leaders = full_df.groupby('batter').agg(
    F.sum(F.when(F.col('event type').isin(20,21,22,23), 1).otherwise(0)).alias('total_hits')

)

hit_leaders \
            .join(bio_select, bio_select.PLAYERID == hit_leaders.batter, 'left') \
            .select('NICKNAME', 'LAST', 'total_hits') \
            .sort('total_hits', ascending=False) .show()



+--------+-----------+----------+
|NICKNAME|       LAST|total_hits|
+--------+-----------+----------+
|    Pete|       Rose|      4256|
|    Hank|      Aaron|      3698|
|   Derek|      Jeter|      3465|
|    Carl|Yastrzemski|      3419|
|  Albert|     Pujols|      3384|
|    Paul|    Molitor|      3319|
|    Stan|     Musial|      3317|
|   Eddie|     Murray|      3255|
|  Willie|       Mays|      3238|
|     Cal|     Ripken|      3184|
|  Miguel|    Cabrera|      3174|
|  Adrian|     Beltre|      3166|
|  George|      Brett|      3154|
|   Robin|      Yount|      3142|
|    Tony|      Gwynn|      3141|
|    Alex|  Rodriguez|      3115|
|    Dave|   Winfield|      3110|
|  Ichiro|     Suzuki|      3089|
|   Craig|     Biggio|      3060|
|  Rickey|  Henderson|      3055|
+--------+-----------+----------+
only showing top 20 rows



                                                                                

### Home run Leaders

In [14]:
bio_select   = bio_df.select('PLAYERID', 'NICKNAME', 'LAST')


hit_leaders = full_df.groupby('batter').agg(
    F.sum(F.when(F.col('event type') == 23, 1).otherwise(0)).alias('total_hrs')

)

hit_leaders \
            .join(bio_select, bio_select.PLAYERID == hit_leaders.batter, 'left') \
            .select('NICKNAME', 'LAST', 'total_hrs') \
            .sort('total_hrs', ascending=False) .show()



+--------+---------+---------+
|NICKNAME|     LAST|total_hrs|
+--------+---------+---------+
|   Barry|    Bonds|      762|
|    Hank|    Aaron|      747|
|  Albert|   Pujols|      703|
|    Babe|     Ruth|      698|
|    Alex|Rodriguez|      696|
|  Willie|     Mays|      655|
|     Ken|  Griffey|      630|
|     Jim|    Thome|      612|
|   Sammy|     Sosa|      609|
|   Frank| Robinson|      586|
|    Mark|  McGwire|      583|
|  Harmon|Killebrew|      573|
|  Rafael| Palmeiro|      569|
|  Reggie|  Jackson|      563|
|   Manny|  Ramirez|      555|
|    Mike|  Schmidt|      548|
|   David|    Ortiz|      541|
|  Mickey|   Mantle|      536|
|  Willie|  McCovey|      521|
|   Frank|   Thomas|      521|
+--------+---------+---------+
only showing top 20 rows



                                                                                

### Hit Leaders from 2000-2024

In [15]:
bio_select   = bio_df.select('PLAYERID', 'NICKNAME', 'LAST')

HIT_LEADERS_2000 = (full_df
.filter(F.col('Season') >= 2000)
 .groupby('batter').agg(
    F.sum(F.when(F.col('event type').isin(20,21,22,23), 1).otherwise(0)).alias('total_hits')

))


HIT_LEADERS_2000 \
            .join(bio_select, bio_select.PLAYERID == hit_leaders.batter, 'left') \
            .select('NICKNAME', 'LAST', 'total_hits') \
            .sort('total_hits', ascending=False) .show()



+--------+---------+----------+
|NICKNAME|     LAST|total_hits|
+--------+---------+----------+
|  Albert|   Pujols|      3384|
|  Miguel|  Cabrera|      3174|
|  Ichiro|   Suzuki|      3089|
|  Adrian|   Beltre|      2976|
|   Derek|    Jeter|      2658|
|Robinson|     Cano|      2639|
|  Carlos|  Beltran|      2515|
|   Jimmy|  Rollins|      2455|
|    Nick| Markakis|      2388|
|   David|    Ortiz|      2379|
| Michael|    Young|      2375|
|   Torii|   Hunter|      2350|
|    Alex|Rodriguez|      2324|
| Freddie|  Freeman|      2267|
|  Aramis|  Ramirez|      2234|
|    Jose|   Altuve|      2232|
|    Juan|   Pierre|      2217|
|  Yadier|   Molina|      2168|
|  Miguel|   Tejada|      2153|
|  Victor| Martinez|      2153|
+--------+---------+----------+
only showing top 20 rows



                                                                                

### Hit Leaders from 2010-2024

In [15]:
bio_select   = bio_df.select('PLAYERID', 'NICKNAME', 'LAST')

HIT_LEADERS_2010 = (full_df
.filter(F.col('Season') >= 2010)
 .groupby('batter').agg(
    F.sum(F.when(F.col('event type').isin(20,21,22,23), 1).otherwise(0)).alias('total_hits')

))


HIT_LEADERS_2010 \
            .join(bio_select, bio_select.PLAYERID == hit_leaders.batter, 'left') \
            .select('NICKNAME', 'LAST', 'total_hits') \
            .sort('total_hits', ascending=False) .show()



+--------+-----------+----------+
|NICKNAME|       LAST|total_hits|
+--------+-----------+----------+
| Freddie|    Freeman|      2267|
|    Jose|     Altuve|      2232|
|    Paul|Goldschmidt|      2056|
|  Andrew|  McCutchen|      2028|
|   Elvis|     Andrus|      1963|
|  Miguel|    Cabrera|      1954|
|   Manny|    Machado|      1900|
|   Nolan|    Arenado|      1826|
| Charlie|   Blackmon|      1805|
|    Joey|      Votto|      1801|
|  Nelson|       Cruz|      1793|
|  Carlos|    Santana|      1789|
|Robinson|       Cano|      1764|
|    Eric|     Hosmer|      1753|
|    J.D.|   Martinez|      1741|
|      DJ|   LeMahieu|      1738|
| Starlin|     Castro|      1722|
|  Xander|   Bogaerts|      1693|
|    Nick|   Markakis|      1684|
|   Bryce|     Harper|      1670|
+--------+-----------+----------+
only showing top 20 rows



                                                                                

### Hit Leader for Every Season

In [17]:
window_for_hit_leaders = Window.partitionBy("Season",).orderBy(F.desc("total_hits"))


season_hit_leaders = full_df \
 .groupby('batter', 'Season').agg(
    F.sum(F.when(F.col('event type').isin(20,21,22,23), 1).otherwise(0)).alias('total_hits')
)\
    .withColumn('season_rank', F.rank().over(window_for_hit_leaders)) \
    .filter(F.col('season_rank') == 1) \
    .drop('season_rank')

bio_select   = bio_df.select('PLAYERID', 'NICKNAME', 'LAST')

season_hit_leaders = season_hit_leaders.join(bio_select, bio_select.PLAYERID == season_hit_leaders.batter) \
                    .drop('PLAYERID', 'batter') \
                    .select('NICKNAME', 'LAST', 'Season', 'total_hits')


season_hit_leaders.show(season_hit_leaders.count())



+------------+--------------+------+----------+
|    NICKNAME|          LAST|Season|total_hits|
+------------+--------------+------+----------+
|          Ty|          Cobb|  1912|       226|
|Shoeless Joe|       Jackson|  1912|       226|
|         Max|         Carey|  1913|       171|
|        Tris|       Speaker|  1914|       187|
|          Ty|          Cobb|  1915|       208|
|        Tris|       Speaker|  1916|       196|
|          Ty|          Cobb|  1917|       225|
|      George|         Burns|  1918|       177|
|       Bobby|         Veach|  1919|       189|
|      George|        Sisler|  1920|       257|
|        Jack|         Tobin|  1921|       230|
|      Rogers|       Hornsby|  1921|       230|
|      Rogers|       Hornsby|  1922|       248|
|     Frankie|        Frisch|  1923|       220|
|        Zack|         Wheat|  1924|       209|
|      Rogers|       Hornsby|  1924|       209|
|          Al|       Simmons|  1925|       248|
|      George|         Burns|  1926|    

                                                                                

### Hit Leaders Season Champs Leaderboard

In [18]:
window_for_hit_leaders = Window.partitionBy("Season",).orderBy(F.desc("total_hits"))


season_hit_leaders = full_df \
 .groupby('batter', 'Season').agg(
    F.sum(F.when(F.col('event type').isin(20,21,22,23), 1).otherwise(0)).alias('total_hits')
)\
    .withColumn('season_rank', F.rank().over(window_for_hit_leaders)) \
    .filter(F.col('season_rank') == 1) \
    .drop('season_rank')

bio_select   = bio_df.select('PLAYERID', 'NICKNAME', 'LAST')

season_hit_leaders = season_hit_leaders \
                    .select( 'Season', 'total_hits', 'batter') \
                    .groupby('batter') \
                    .agg(F.count('batter').alias('Total Hitting Leaders'))

season_hit_leaders = season_hit_leaders \
                    .join(bio_select, bio_select.PLAYERID == season_hit_leaders.batter ) \
                    .drop('PLAYERID', 'batter') \
                    .sort('Total Hitting Leaders', ascending=False )




season_hit_leaders.show(season_hit_leaders.count())



+---------------------+------------+--------------+
|Total Hitting Leaders|    NICKNAME|          LAST|
+---------------------+------------+--------------+
|                    7|      Ichiro|        Suzuki|
|                    7|        Pete|          Rose|
|                    5|        Tony|         Gwynn|
|                    3|       Kirby|       Puckett|
|                    3|      Rogers|       Hornsby|
|                    3|          Ty|          Cobb|
|                    2|      Richie|       Ashburn|
|                    2|        Trea|        Turner|
|                    2|       Frank|     McCormick|
|                    2|        Jose|        Altuve|
|                    2|      George|         Brett|
|                    2|      George|         Burns|
|                    2|        Vada|        Pinson|
|                    2|        Dale|      Mitchell|
|                    2|        Paul|       Molitor|
|                    2|        Stan|        Musial|
|           

                                                                                

# Immaculate Grid Query Number 770

#### *What is a Player in the Hall of Fame that has a 100+ Rbi Season?*

In [19]:
bio_hof_select = bio_df.select('PLAYERID', 'NICKNAME', 'LAST', 'HOF') \
                .filter(F.col('HOF') == 'HOF')


rbi_hundred_season_batters = full_df \
                            .groupby('batter', 'Season') \
                            .agg(F.sum(F.col('RBI on play')).alias('Total RBIs')) \
                            .filter(F.col('Total RBIs') >= 100)

rbi_hundred_season_batters = rbi_hundred_season_batters \
                            .join(bio_hof_select, bio_hof_select.PLAYERID == rbi_hundred_season_batters.batter, 'inner') \
                            .select('NICKNAME', 'LAST', 'Season', 'Total RBIs') \
                            .show()





+--------+-----------+------+----------+
|NICKNAME|       LAST|Season|Total RBIs|
+--------+-----------+------+----------+
|Home Run|      Baker|  1912|       134|
|      Ty|       Cobb|  1915|       101|
|   Billy|   Williams|  1970|       129|
|    Tony|      Perez|  1970|       129|
|     Joe|      Torre|  1971|       137|
|  Johnny|      Bench|  1972|       125|
|     Ted|    Simmons|  1975|       100|
|    Tony|      Perez|  1975|       109|
|     Joe|     Morgan|  1976|       111|
|    Dave|     Parker|  1978|       117|
|    Mike|    Schmidt|  1979|       114|
|  Harmon|  Killebrew|  1970|       113|
|  Harmon|  Killebrew|  1971|       119|
|    Carl|Yastrzemski|  1977|       102|
|  Reggie|    Jackson|  1977|       110|
|     Jim|       Rice|  1978|       139|
|     Jim|       Rice|  1979|       130|
|    Bill|      Terry|  1930|       126|
|     Mel|        Ott|  1931|       106|
|   Chuck|      Klein|  1932|       131|
+--------+-----------+------+----------+
only showing top

                                                                                

#### *What is a Player on the Braves in the Hall of Fame?*

In [20]:
bio_hof_select = bio_df.select('PLAYERID', 'NICKNAME', 'LAST', 'HOF') \
                .filter(F.col('HOF') == 'HOF')


 # batting team.  A one-character identification of the team at
 #                    bat ("0" for the visiting team and "1" for the
 #                    home team).

braves_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'ATL')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


braves_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'ATL')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')


braves_players = braves_non_pitchers.unionByName(braves_pitchers)


braves_players_hof = braves_players \
                    .join(bio_hof_select, bio_hof_select.PLAYERID == braves_players.player, 'inner') \
                    .select('NICKNAME', 'LAST').show()








+--------+--------+
|NICKNAME|    LAST|
+--------+--------+
| Orlando|  Cepeda|
|    Hoyt| Wilhelm|
|    Hank|   Aaron|
|    Phil|  Niekro|
|    Tony|La Russa|
|    Greg|  Maddux|
|    John|  Smoltz|
| Chipper|   Jones|
|     Tom| Glavine|
|     Joe|   Torre|
|   Eddie| Mathews|
| Gaylord|   Perry|
|   Bruce|  Sutter|
|     Ted| Simmons|
|    Fred| McGriff|
|     Tom|  Seaver|
|   Steve| Carlton|
|    Hoyt| Wilhelm|
| Gaylord|   Perry|
|    Juan|Marichal|
+--------+--------+
only showing top 20 rows



                                                                                

#### *What is a Player who has played for the Dodgers in the HOF?*

In [66]:
bio_hof_select = bio_df.select('PLAYERID', 'NICKNAME', 'LAST', 'HOF') \
                .filter(F.col('HOF') == 'HOF')


 # batting team.  A one-character identification of the team at
 #                    bat ("0" for the visiting team and "1" for the
 #                    home team).

braves_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'LAN')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


braves_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'LAN')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')


braves_players = braves_non_pitchers.unionByName(braves_pitchers)


braves_players_hof = braves_players \
                    .join(bio_hof_select, bio_hof_select.PLAYERID == braves_players.player, 'inner') \
                    .select('NICKNAME', 'LAST').show()








+--------+---------+
|NICKNAME|     LAST|
+--------+---------+
|    Dick|    Allen|
|     Don|   Sutton|
|   Frank| Robinson|
|    Juan| Marichal|
|  Adrian|   Beltre|
|  Rickey|Henderson|
|    Fred|  McGriff|
|    Greg|   Maddux|
|     Jim|    Thome|
|   Sandy|   Koufax|
|     Don| Drysdale|
|    Duke|   Snider|
|     Gil|   Hodges|
|     Jim|  Bunning|
| Pee Wee|    Reese|
|   Eddie|   Murray|
|    Gary|   Carter|
|   Pedro| Martinez|
|    Mike|   Piazza|
|     Tom|   Seaver|
+--------+---------+
only showing top 20 rows



                                                                                

#### *What is a Player that has played on both the Tigers and the Braves?*

In [70]:
# bio_hof_select = bio_df.select('PLAYERID', 'NICKNAME', 'LAST', 'HOF') \
#                 .filter(F.col('HOF') == 'HOF')


 # batting team.  A one-character identification of the team at
 #                    bat ("0" for the visiting team and "1" for the
 #                    home team).

tigers_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'DET')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


tigers_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'DET')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')

tigers_players = tigers_non_pitchers.unionByName(tigers_pitchers)



braves_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'ATL')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


braves_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'ATL')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')



braves_players = braves_non_pitchers.union(braves_pitchers)


both_teams = braves_players \
             .join(tigers_players, tigers_players.player == braves_players.player, 'inner' ) \
                .drop(tigers_players.player)


both_teams = both_teams \
            .join(bio_select, bio_select.PLAYERID == both_teams.player, 'inner') \
            .select('NICKNAME', 'LAST').show()









                                                                                

+--------+----------+
|NICKNAME|      LAST|
+--------+----------+
|  Mickey|    Lolich|
|     Joe|    Niekro|
|     Joe|    Niekro|
|    Earl|    Wilson|
|   Denny|    McLain|
|   Daryl| Patterson|
|    Fred|  Scherman|
|     Joe|   Coleman|
|     Bob|    Didier|
|  Woodie|    Fryman|
|    Fred|Holdsworth|
|    Jack|    Pierce|
| Charlie|    Spikes|
|    Luis|   Polonia|
|  Robert|      Fick|
|    Omar|   Infante|
| Randall|     Simon|
|  George|   Lombard|
|    Greg|    Norton|
|   Brent|   Clevlen|
+--------+----------+
only showing top 20 rows



#### *What is a Player who has played for both the Dodgers and Tigers*

In [72]:


# batting team.  A one-character identification of the team at
 #                    bat ("0" for the visiting team and "1" for the
 #                    home team).

tigers_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'DET')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


tigers_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'DET')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')

tigers_players = tigers_non_pitchers.unionByName(tigers_pitchers)



braves_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'LAN')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


braves_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'LAN')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')



braves_players = braves_non_pitchers.union(braves_pitchers)


both_teams = braves_players \
             .join(tigers_players, tigers_players.player == braves_players.player, 'inner' ) \
                .drop(tigers_players.player)


both_teams = both_teams \
            .join(bio_select, bio_select.PLAYERID == both_teams.player, 'inner') \
            .select('NICKNAME', 'LAST').show()









                                                                                

+--------+-----------+
|NICKNAME|       LAST|
+--------+-----------+
|  Mickey|     Lolich|
|     Joe|     Niekro|
|    Bill|     Denehy|
|   Denny|     McLain|
|   Daryl|  Patterson|
|    Fred|   Scherman|
|     Joe|    Coleman|
|     Ron| Perranoski|
|    Duke|       Sims|
|     Tom|     Haller|
|   Frank|     Howard|
|  Woodie|     Fryman|
|    Fred| Holdsworth|
|    Gene|    Michael|
|    Kirk|     Gibson|
|    Juan|Encarnacion|
|   Karim|     Garcia|
|    Brad|     Ausmus|
|   Hiram|  Bocachica|
|   Roger|     Cedeno|
+--------+-----------+
only showing top 20 rows



#### *What is a player from the Detroit Tigers with a 100+ RBI season*

In [79]:
# bio_hof_select = bio_df.select('PLAYERID', 'NICKNAME', 'LAST', 'HOF') \
#                 .filter(F.col('HOF') == 'HOF')

tigers_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'DET')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')





rbi_hundred_season_batters_det = full_df \
                            .join(tigers_non_pitchers, tigers_non_pitchers.player == full_df.batter, 'inner') \
                            .groupby('batter', 'Season') \
                            .agg(F.sum(F.col('RBI on play')).alias('Total RBIs')) \
                            .filter(F.col('Total RBIs') >= 100)


rbi_hundred_season_batters_det = rbi_hundred_season_batters_det \
                            .join(bio_select, bio_select.PLAYERID == rbi_hundred_season_batters_det.batter, 'inner') \
                            .select('NICKNAME', 'LAST', 'Season', 'Total RBIs') \
                            .show()






                                                                                

+--------+---------+------+----------+
|NICKNAME|     LAST|Season|Total RBIs|
+--------+---------+------+----------+
|   Steve|     Kemp|  1979|       105|
|    Dale|Alexander|  1930|       123|
|     Gee|   Walker|  1937|       110|
|     Gee|   Walker|  1939|       103|
| Charlie|   Keller|  1941|       122|
| Charlie|   Keller|  1942|       104|
|   Marty|  McManus|  1922|       107|
|    Dale|Alexander|  1929|       118|
|   Steve|     Kemp|  1980|       101|
|    Alan| Trammell|  1987|       105|
|    Matt|   Stairs|  1998|       106|
|    Matt|   Stairs|  1999|       102|
|    J.D.| Martinez|  2023|       103|
|    J.D.| Martinez|  2015|       102|
|    J.D.| Martinez|  2017|       104|
|    J.D.| Martinez|  2018|       130|
|    J.D.| Martinez|  2019|       105|
|     Sam| Crawford|  1912|       116|
|     Sam| Crawford|  1915|       116|
|   Goose|   Goslin|  1930|       126|
+--------+---------+------+----------+
only showing top 20 rows



#### *What is a player on the Pirates with a 100+ RBI Season?*

In [80]:
# bio_hof_select = bio_df.select('PLAYERID', 'NICKNAME', 'LAST', 'HOF') \
#                 .filter(F.col('HOF') == 'HOF')

pit_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'PIT')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')





rbi_hundred_season_batters_pir = full_df \
                            .join(pit_non_pitchers, pit_non_pitchers.player == full_df.batter, 'inner') \
                            .groupby('batter', 'Season') \
                            .agg(F.sum(F.col('RBI on play')).alias('Total RBIs')) \
                            .filter(F.col('Total RBIs') >= 100)


rbi_hundred_season_batters_pir = rbi_hundred_season_batters_pir \
                            .join(bio_select, bio_select.PLAYERID == rbi_hundred_season_batters_pir.batter, 'inner') \
                            .select('NICKNAME', 'LAST', 'Season', 'Total RBIs') \
                            .show()








+------------+-------+------+----------+
|    NICKNAME|   LAST|Season|Total RBIs|
+------------+-------+------+----------+
|       Steve|   Kemp|  1979|       105|
|         Gus|   Suhr|  1930|       108|
|       Chuck|  Klein|  1930|       155|
|       Chuck|  Klein|  1931|       116|
|       Chuck|  Klein|  1932|       131|
|       Chuck|  Klein|  1933|       118|
|         Gus|   Suhr|  1936|       112|
|       Chuck|  Klein|  1936|       100|
|       Ralph|  Kiner|  1953|       111|
|      Walker| Cooper|  1947|       118|
|       Ralph|  Kiner|  1947|       120|
|       Ralph|  Kiner|  1948|       111|
|       Ralph|  Kiner|  1949|       104|
|High Pockets|  Kelly|  1921|       127|
|High Pockets|  Kelly|  1922|       106|
|High Pockets|  Kelly|  1923|       100|
|High Pockets|  Kelly|  1924|       136|
|       Chuck|  Klein|  1929|       141|
|       Bobby|Bonilla|  1988|       100|
|       Steve|   Kemp|  1980|       101|
+------------+-------+------+----------+
only showing top

                                                                                

#### *What is a player that has played for both the Pirates and the Dodgers?*

In [81]:


# batting team.  A one-character identification of the team at
 #                    bat ("0" for the visiting team and "1" for the
 #                    home team).

tigers_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'PIT')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


tigers_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'PIT')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')

tigers_players = tigers_non_pitchers.unionByName(tigers_pitchers)



braves_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'LAN')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


braves_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'LAN')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')



braves_players = braves_non_pitchers.union(braves_pitchers)


both_teams = braves_players \
             .join(tigers_players, tigers_players.player == braves_players.player, 'inner' ) \
                .drop(tigers_players.player)


both_teams = both_teams \
            .join(bio_select, bio_select.PLAYERID == both_teams.player, 'inner') \
            .select('NICKNAME', 'LAST').show()









                                                                                

+--------+----------+
|NICKNAME|      LAST|
+--------+----------+
|  Nelson|    Briles|
| Orlando|      Pena|
|     Joe|    Gibbon|
|   Bruce|Dal Canton|
|    John|      Lamb|
|     Bob|     Moose|
|    Dock|     Ellis|
|    Dave|    Giusti|
|  Mudcat|     Grant|
|  Mudcat|     Grant|
|     Bob|     Veale|
|    Gene|    Garber|
|      Al|    McBean|
|      Al|    McBean|
|     Bob|   Johnson|
|     Jim|    Nelson|
|    Luke|    Walker|
|     Vic| Davalillo|
|      Al|    Oliver|
|  George|    Brunet|
+--------+----------+
only showing top 20 rows



#### *What is a player that has played for both the Braves and the Pirates?*

In [82]:


# batting team.  A one-character identification of the team at
 #                    bat ("0" for the visiting team and "1" for the
 #                    home team).

tigers_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'PIT')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


tigers_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'PIT')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')

tigers_players = tigers_non_pitchers.unionByName(tigers_pitchers)



braves_non_pitchers = full_df \
    .filter(
        (F.col('batting team') == 1) & (F.col('home team') == 'ATL')
    ) \
    .select('batter') \
    .distinct() \
    .withColumnRenamed('batter', 'player')


braves_pitchers = full_df \
    .filter(
        (F.col('batting team') == 0) & (F.col('visiting team') == 'ATL')
    ) \
    .select('pitcher') \
    .distinct() \
    .withColumnRenamed('pitcher', 'player')



braves_players = braves_non_pitchers.union(braves_pitchers)


both_teams = braves_players \
             .join(tigers_players, tigers_players.player == braves_players.player, 'inner' ) \
                .drop(tigers_players.player)


both_teams = both_teams \
            .join(bio_select, bio_select.PLAYERID == both_teams.player, 'inner') \
            .select('NICKNAME', 'LAST').show()









                                                                                

+--------+----------+
|NICKNAME|      LAST|
+--------+----------+
|  Nelson|    Briles|
| Orlando|      Pena|
|  George|    Kopacz|
|     Joe|    Gibbon|
|   Bruce|Dal Canton|
|   Bruce|Dal Canton|
|     Bob|     Moose|
|    Dock|     Ellis|
|    Dave|    Giusti|
|  Mudcat|     Grant|
|     Bob|     Veale|
|    Gene|    Garber|
|    Gene|    Garber|
|      Al|    McBean|
|     Bob|   Johnson|
|     Bob|   Johnson|
|    Luke|    Walker|
|   Steve|     Blass|
|   Chuck|    Goggin|
|   Ramon| Hernandez|
+--------+----------+
only showing top 20 rows



#### Grid Results using options from PySpark Queries
#### ![Immaculate Grid Complete](https://raw.githubusercontent.com/jwolfe972/Spark-Projects/main/spark-learning-notebooks/images/mlb-imgs/immaculate_grid_complete.png)

