# MLB Scraper

This Python module provides a class `MLB_Scrape` that interacts with the MLB Stats API to retrieve various types of baseball-related data. The data is processed and returned as Polars DataFrames for easy manipulation and analysis.

## Requirements

- Python 3.x
- `requests` library
- `polars` library
- `numpy` library
- `tqdm` library
- `pytz` library

You can install the required libraries using pip:

```sh
pip install requests polars numpy tqdm pytz
```

## Usage

Import the MLB_Scrape class from the module and Initialize the scraper

In [2]:
import requests
import polars as pl
import numpy as np
from datetime import datetime
import pybaseball as pyb
from tqdm import tqdm
from pytz import timezone
import re
from concurrent.futures import ThreadPoolExecutor, as_completed

In [4]:
# Import the MLB_Scrape class from the module
from api_scraper import MLB_Scrape

# Initialize the scraper
scraper = MLB_Scrape()

#### get_sport_id()

Retrieves the list of sports from the MLB Stats API and processes it into a Polars DataFrame.

In [5]:
# Call the get_sport_id method
sport_ids = scraper.get_sport_id()
print(sport_ids)
print(sport_ids['name'].to_list())

shape: (19, 7)
┌──────┬──────┬─────────────────────┬────────────────────┬──────────────┬───────────┬──────────────┐
│ id   ┆ code ┆ link                ┆ name               ┆ abbreviation ┆ sortOrder ┆ activeStatus │
│ ---  ┆ ---  ┆ ---                 ┆ ---                ┆ ---          ┆ ---       ┆ ---          │
│ i64  ┆ str  ┆ str                 ┆ str                ┆ str          ┆ i64       ┆ bool         │
╞══════╪══════╪═════════════════════╪════════════════════╪══════════════╪═══════════╪══════════════╡
│ 1    ┆ mlb  ┆ /api/v1/sports/1    ┆ Major League       ┆ MLB          ┆ 11        ┆ true         │
│      ┆      ┆                     ┆ Baseball           ┆              ┆           ┆              │
│ 11   ┆ aaa  ┆ /api/v1/sports/11   ┆ Triple-A           ┆ AAA          ┆ 101       ┆ true         │
│ 12   ┆ aax  ┆ /api/v1/sports/12   ┆ Double-A           ┆ AA           ┆ 201       ┆ true         │
│ 13   ┆ afa  ┆ /api/v1/sports/13   ┆ High-A             ┆ A+           ┆ 30

##### get_sport_id_check()
Checks if the provided sport ID exists in the list of sports retrieved from the MLB Stats API.

In [6]:
# Call the get_sport_id_check method
is_valid = scraper.get_sport_id_check(sport_id=1)
print(is_valid)


True


##### get_schedule()
Retrieves the schedule of baseball games based on the specified parameters.

In [7]:
# Call the get_schedule method
schedule = scraper.get_schedule(year_input=[2025], sport_id=[22], game_type=['R'])
print(schedule)

shape: (866, 8)
┌─────────┬──────────┬────────────┬───────────────┬──────────────┬───────┬──────────┬──────────────┐
│ game_id ┆ time     ┆ date       ┆ away          ┆ home         ┆ state ┆ venue_id ┆ venue_name   │
│ ---     ┆ ---      ┆ ---        ┆ ---           ┆ ---          ┆ ---   ┆ ---      ┆ ---          │
│ i64     ┆ str      ┆ date       ┆ str           ┆ str          ┆ str   ┆ i64      ┆ str          │
╞═════════╪══════════╪════════════╪═══════════════╪══════════════╪═══════╪══════════╪══════════════╡
│ 791940  ┆ 02:30 PM ┆ 2025-01-31 ┆ Arkansas Tech ┆ Jefferson    ┆ F     ┆ 2392     ┆ Daikin Park  │
│         ┆          ┆            ┆ Wonder Boys   ┆ Rams         ┆       ┆          ┆              │
│ 791941  ┆ 11:00 AM ┆ 2025-01-31 ┆ Mississippi   ┆ Arkansas-Mon ┆ F     ┆ 2392     ┆ Daikin Park  │
│         ┆          ┆            ┆ College       ┆ ticello Boll ┆       ┆          ┆              │
│         ┆          ┆            ┆ Choctaws      ┆ Weevi…       ┆       ┆ 

#### get_data() and get_data_df()

Retrieves live game data for a list of game IDs and Converts a list of game data JSON objects into a Polars DataFrame.

In [8]:
# Call the get_data method
game_data = scraper.get_data(game_list_input=[745444])
# Call the get_data_df method
data_df = scraper.get_data_df(data_list=game_data)
print(data_df)
print(data_df.columns)

This May Take a While. Progress Bar shows Completion of Data Retrieval.


Processing: 100%|██████████| 1/1 [00:00<00:00,  3.85iteration/s]

Converting Data to Dataframe.
shape: (304, 78)
┌─────────┬────────────┬───────────┬─────────────┬───┬────────────┬──────┬────────────┬────────────┐
│ game_id ┆ game_date  ┆ batter_id ┆ batter_name ┆ … ┆ event_type ┆ rbi  ┆ away_score ┆ home_score │
│ ---     ┆ ---        ┆ ---       ┆ ---         ┆   ┆ ---        ┆ ---  ┆ ---        ┆ ---        │
│ i64     ┆ str        ┆ i64       ┆ str         ┆   ┆ str        ┆ i64  ┆ i64        ┆ i64        │
╞═════════╪════════════╪═══════════╪═════════════╪═══╪════════════╪══════╪════════════╪════════════╡
│ 745444  ┆ 2024-03-20 ┆ 605141    ┆ Mookie      ┆ … ┆ null       ┆ null ┆ null       ┆ null       │
│         ┆            ┆           ┆ Betts       ┆   ┆            ┆      ┆            ┆            │
│ 745444  ┆ 2024-03-20 ┆ 605141    ┆ Mookie      ┆ … ┆ null       ┆ null ┆ null       ┆ null       │
│         ┆            ┆           ┆ Betts       ┆   ┆            ┆      ┆            ┆            │
│ 745444  ┆ 2024-03-20 ┆ 605141    ┆ Mookie 




#### get_teams()

Retrieves information about MLB teams from the MLB Stats API and processes it into a Polars DataFrame.

In [9]:
# Get unique values as a list and print
print(data_df['event'].unique().to_list())



['Field Error', 'Groundout', None, 'Sac Fly', 'Flyout', 'Grounded Into DP', 'Strikeout', 'Hit By Pitch', 'Double Play', 'Lineout', 'Single', 'Forceout', 'Walk', 'Fielders Choice', 'Pop Out']


In [10]:
# Call the get_teams method
teams = scraper.get_teams()
print(teams)

shape: (761, 10)
┌─────────┬────────────┬───────────┬───────────┬───┬───────────┬───────────┬───────────┬───────────┐
│ team_id ┆ city       ┆ name      ┆ franchise ┆ … ┆ parent_or ┆ league_id ┆ league_na ┆ parent_or │
│ ---     ┆ ---        ┆ ---       ┆ ---       ┆   ┆ g         ┆ ---       ┆ me        ┆ g_abbrevi │
│ i64     ┆ str        ┆ str       ┆ str       ┆   ┆ ---       ┆ i64       ┆ ---       ┆ ation     │
│         ┆            ┆           ┆           ┆   ┆ str       ┆           ┆ str       ┆ ---       │
│         ┆            ┆           ┆           ┆   ┆           ┆           ┆           ┆ str       │
╞═════════╪════════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡
│ 100     ┆ Georgia    ┆ Yellow    ┆ Georgia   ┆ … ┆ Office of ┆ 107       ┆ College   ┆ null      │
│         ┆ Tech       ┆ Jackets   ┆ Tech      ┆   ┆ the Commi ┆           ┆ Baseball  ┆           │
│         ┆ Yellow     ┆           ┆ Yellow    ┆   ┆ ssioner   ┆          

#### get_leagues()
Retrieves information about MLB leagues from the MLB Stats API and processes it into a Polars DataFrame.

In [11]:
# Call the get_leagues method
leagues = scraper.get_leagues()
print(leagues)
print(leagues['league_name'].to_list())


shape: (117, 4)
┌───────────┬─────────────────────────────────┬─────────────────────┬──────────┐
│ league_id ┆ league_name                     ┆ league_abbreviation ┆ sport_id │
│ ---       ┆ ---                             ┆ ---                 ┆ ---      │
│ i64       ┆ str                             ┆ str                 ┆ i64      │
╞═══════════╪═════════════════════════════════╪═════════════════════╪══════════╡
│ 103       ┆ American League                 ┆ AL                  ┆ 1        │
│ 104       ┆ National League                 ┆ NL                  ┆ 1        │
│ 114       ┆ Cactus League                   ┆ CL                  ┆ null     │
│ 115       ┆ Grapefruit League               ┆ GL                  ┆ null     │
│ 117       ┆ International League            ┆ INT                 ┆ 11       │
│ …         ┆ …                               ┆ …                   ┆ …        │
│ 108       ┆ College Baseball                ┆ CBB                 ┆ 22       │
│ 587       

In [12]:
# Filter the leagues DataFrame for 'MLB Draft League'
mlb_draft_league = leagues.filter(leagues['league_name'] == "Arizona Fall League")

# Print the full row(s) to see all details
print(mlb_draft_league)

# Or print just the sport_id
print(mlb_draft_league['sport_id'])

shape: (1, 4)
┌───────────┬─────────────────────┬─────────────────────┬──────────┐
│ league_id ┆ league_name         ┆ league_abbreviation ┆ sport_id │
│ ---       ┆ ---                 ┆ ---                 ┆ ---      │
│ i64       ┆ str                 ┆ str                 ┆ i64      │
╞═══════════╪═════════════════════╪═════════════════════╪══════════╡
│ 119       ┆ Arizona Fall League ┆ AFL                 ┆ 17       │
└───────────┴─────────────────────┴─────────────────────┴──────────┘
shape: (1,)
Series: 'sport_id' [i64]
[
	17
]


#### get_player_games_list()
Retrieves a list of game IDs for a specific player in a given season.

In [13]:
# Call the get_player_games_list method
player_games = scraper.get_player_games_list(player_id=660271, season=2024, pitching=False)
print(player_games)

[745444, 746175, 746165, 746167, 746168, 746166, 746170, 746169, 746163, 746897, 746896, 746895, 745925, 745924, 745923, 746162, 746161, 746164, 746158, 746157, 746159, 746160, 746156, 746154, 744863, 744864, 744867, 744949, 744946, 744947, 747211, 747210, 746152, 746155, 746153, 746149, 746147, 746151, 745421, 745422, 745343, 745342, 745341, 746150, 746148, 746142, 746144, 746146, 746145, 746143, 746713, 746710, 746711, 745818, 745817, 746140, 746138, 746141, 745497, 745494, 745495, 745737, 745735, 745736, 746139, 746137, 746136, 746134, 746135, 746130, 746543, 746542, 746538, 746539, 746131, 746132, 746780, 746775, 746777, 745319, 745317, 745320, 746133, 746128, 746126, 746127, 746129, 746122, 745557, 745556, 745558, 746450, 746447, 746449, 746123, 746121, 746125, 746124, 746118, 746119, 746117, 746361, 746362, 746364, 745389, 745386, 745631, 745636, 745627, 746116, 746120, 746115, 746113, 746112, 746114, 745953, 745954, 745951, 745950, 745144, 745140, 745139, 746109, 746107, 746110,

#### get_game_types()
Retrieves the different types of MLB games from the MLB Stats API and processes them into a Polars DataFrame.

In [14]:
# Call the get_game_types method
game_types = scraper.get_game_types()
#show all 12 game types

print(game_types)

shape: (12, 2)
┌─────┬────────────────────────────┐
│ id  ┆ description                │
│ --- ┆ ---                        │
│ str ┆ str                        │
╞═════╪════════════════════════════╡
│ S   ┆ Spring Training            │
│ R   ┆ Regular Season             │
│ F   ┆ Wild Card                  │
│ D   ┆ Division Series            │
│ L   ┆ League Championship Series │
│ …   ┆ …                          │
│ N   ┆ Nineteenth Century Series  │
│ P   ┆ Playoffs                   │
│ A   ┆ All-Star Game              │
│ I   ┆ Intrasquad                 │
│ E   ┆ Exhibition                 │
└─────┴────────────────────────────┘


#### get_players()
Retrieves player information from the MLB Stats API and processes them into a Polars DataFrame.

In [15]:
df_player = scraper.get_players(sport_id=1,season=2025,game_type=['S'])
print(df_player)
print(df_player.columns)

shape: (1_482, 6)
┌───────────┬────────────┬───────────┬──────────────────────┬──────────┬──────┐
│ player_id ┆ first_name ┆ last_name ┆ name                 ┆ position ┆ team │
│ ---       ┆ ---        ┆ ---       ┆ ---                  ┆ ---      ┆ ---  │
│ i64       ┆ str        ┆ str       ┆ str                  ┆ str      ┆ i64  │
╞═══════════╪════════════╪═══════════╪══════════════════════╪══════════╪══════╡
│ 434378    ┆ Justin     ┆ Verlander ┆ Justin Verlander     ┆ P        ┆ 137  │
│ 445276    ┆ Kenley     ┆ Jansen    ┆ Kenley Jansen        ┆ P        ┆ 108  │
│ 445926    ┆ Jesse      ┆ Chavez    ┆ Jesse Chavez         ┆ P        ┆ 144  │
│ 450203    ┆ Charles    ┆ Morton    ┆ Charlie Morton       ┆ P        ┆ 110  │
│ 453286    ┆ Maxwell    ┆ Scherzer  ┆ Max Scherzer         ┆ P        ┆ 141  │
│ …         ┆ …          ┆ …         ┆ …                    ┆ …        ┆ …    │
│ 828470    ┆ Nicholas   ┆ Wissman   ┆ Nick Wissman         ┆ P        ┆ 135  │
│ 828496    ┆ Jonathan

## Example

In this example we will return all the pitch-by-pitch data for Bryce Miller in the 2024 MLB Regular Season

In [16]:
import polars as pl

player_id = 683003
season = 2025
player_games = scraper.get_player_games_list(player_id=player_id, season=season, game_type=['S'], pitching=True,)

# Get Data for Bryce Miler
data = scraper.get_data(game_list_input=player_games)
df = scraper.get_data_df(data_list=data)
# Print the data
print(df)
print(df.columns)

This May Take a While. Progress Bar shows Completion of Data Retrieval.


Processing: 100%|██████████| 4/4 [00:00<00:00,  8.32iteration/s]

Converting Data to Dataframe.
shape: (1_180, 78)
┌─────────┬────────────┬───────────┬─────────────┬───┬────────────┬──────┬────────────┬────────────┐
│ game_id ┆ game_date  ┆ batter_id ┆ batter_name ┆ … ┆ event_type ┆ rbi  ┆ away_score ┆ home_score │
│ ---     ┆ ---        ┆ ---       ┆ ---         ┆   ┆ ---        ┆ ---  ┆ ---        ┆ ---        │
│ i64     ┆ str        ┆ i64       ┆ str         ┆   ┆ str        ┆ i64  ┆ i64        ┆ i64        │
╞═════════╪════════════╪═══════════╪═════════════╪═══╪════════════╪══════╪════════════╪════════════╡
│ 779038  ┆ 2025-02-25 ┆ 642201    ┆ Eli White   ┆ … ┆ null       ┆ null ┆ null       ┆ null       │
│ 779038  ┆ 2025-02-25 ┆ 642201    ┆ Eli White   ┆ … ┆ null       ┆ null ┆ null       ┆ null       │
│ 779038  ┆ 2025-02-25 ┆ 642201    ┆ Eli White   ┆ … ┆ single     ┆ 0    ┆ 0          ┆ 0          │
│ 779038  ┆ 2025-02-25 ┆ 805373    ┆ Nacho       ┆ … ┆ null       ┆ null ┆ null       ┆ null       │
│         ┆            ┆           ┆ Alvar




In [17]:
import polars as pl

# Set player ID and season
player_id = 696149
season = 2025

# Get player games list for the specified season
player_games = scraper.get_player_games_list(player_id=player_id, season=season, game_type=['S'], pitching=True)

# Get the data using the game list
data = scraper.get_data(game_list_input=player_games)

# Convert the data into a DataFrame
df = scraper.get_data_df(data_list=data)

# Get unique values for 'event' and 'event_type'
print(df['event'].unique())
# Convert the unique events to a list
# Convert the unique events to a list
event_list = list(df['event'].unique())
print(event_list)




This May Take a While. Progress Bar shows Completion of Data Retrieval.


Processing:   0%|          | 0/3 [00:00<?, ?iteration/s]

Processing: 100%|██████████| 3/3 [00:00<00:00,  5.59iteration/s]


Converting Data to Dataframe.
shape: (18,)
Series: 'event' [str]
[
	"Groundout"
	"Home Run"
	"Double Play"
	"Flyout"
	"Strikeout"
	…
	"Walk"
	"Grounded Into DP"
	"Fielders Choice"
	"Hit By Pitch"
	"Sac Fly"
]
[None, 'Forceout', 'Walk', 'Grounded Into DP', 'Groundout', 'Field Error', 'Double Play', 'Strikeout', 'Fielders Choice', 'Hit By Pitch', 'Home Run', 'Lineout', 'Single', 'Runner Out', 'Double', 'Pop Out', 'Flyout', 'Sac Fly']


In [18]:
# Specify the game date you're interested in
game_date_to_filter = '2025-02-23'  # Replace with the game date you want to filter (format: 'YYYY-MM-DD')

# Filter the DataFrame for the specific game_date
specific_game_data = df.filter(df['game_date'] == game_date_to_filter)

# Print the filtered data for the specific game
print(specific_game_data)

shape: (247, 78)
┌─────────┬────────────┬───────────┬─────────────┬───┬────────────┬──────┬────────────┬────────────┐
│ game_id ┆ game_date  ┆ batter_id ┆ batter_name ┆ … ┆ event_type ┆ rbi  ┆ away_score ┆ home_score │
│ ---     ┆ ---        ┆ ---       ┆ ---         ┆   ┆ ---        ┆ ---  ┆ ---        ┆ ---        │
│ i64     ┆ str        ┆ i64       ┆ str         ┆   ┆ str        ┆ i64  ┆ i64        ┆ i64        │
╞═════════╪════════════╪═══════════╪═════════════╪═══╪════════════╪══════╪════════════╪════════════╡
│ 779039  ┆ 2025-02-23 ┆ 666397    ┆ Edouard     ┆ … ┆ single     ┆ 0    ┆ 0          ┆ 0          │
│         ┆            ┆           ┆ Julien      ┆   ┆            ┆      ┆            ┆            │
│ 779039  ┆ 2025-02-23 ┆ 686797    ┆ Brooks Lee  ┆ … ┆ null       ┆ null ┆ null       ┆ null       │
│ 779039  ┆ 2025-02-23 ┆ 686797    ┆ Brooks Lee  ┆ … ┆ grounded_i ┆ 0    ┆ 0          ┆ 0          │
│         ┆            ┆           ┆             ┆   ┆ nto_double ┆      ┆

With the DataFrame, we can filter only pitches thrown by Bryce Miller this season and then group by pitch type to get the metrics for each pitch.

We will be getting the following metrics:
- pitches: Number of Pitches
- start_speed: Initial Velocity of the Pitch (mph)
- ivb: Induced Vertical Break (in)
- hb: Horizontal Break (in)
- spin_rate: Spin Rate (rpm)

In [19]:
player_id = 683003
season = 2025
player_games = scraper.get_player_games_list(player_id=player_id, season=season, game_type=['S'], pitching=True)

# Get the data using the game list
data = scraper.get_data(game_list_input=player_games)

# Convert the data into a DataFrame
df = scraper.get_data_df(data_list=data)
# Group the data by pitch type
grouped_df = (
    df.filter(pl.col("pitcher_id") == player_id)
    .group_by(['pitcher_id', 'pitch_type', 'game_date'])
    .agg([
        pl.col('is_pitch').drop_nans().count().alias('pitches'),
        pl.col('start_speed').drop_nans().mean().round(1).alias('start_speed'),
        # pl.col('ivb').drop_nans().mean().round(1).alias('ivb'),
        # pl.col('hb').drop_nans().mean().round(1).alias('hb'),
        pl.col('vz0').drop_nans().mean().round(1).alias('vz0'),
        pl.col('vx0').drop_nans().mean().round(1).alias('vx0'),
        # pl.col('spin_rate').drop_nans().mean().round(0).alias('spin_rate'),
        # pl.col('extension').drop_nans().mean().round(0).alias('extension'),
    ])
    .with_columns(
        (pl.col('pitches') / pl.col('pitches').sum().over('pitcher_id')).round(3).alias('proportion')
    )
    ).sort('proportion', descending=True)

# Display the grouped DataFrame
print(grouped_df)

This May Take a While. Progress Bar shows Completion of Data Retrieval.


Processing:   0%|          | 0/4 [00:00<?, ?iteration/s]

Processing: 100%|██████████| 4/4 [00:00<00:00,  5.34iteration/s]


Converting Data to Dataframe.
shape: (18, 8)
┌────────────┬────────────┬────────────┬─────────┬─────────────┬──────┬─────┬────────────┐
│ pitcher_id ┆ pitch_type ┆ game_date  ┆ pitches ┆ start_speed ┆ vz0  ┆ vx0 ┆ proportion │
│ ---        ┆ ---        ┆ ---        ┆ ---     ┆ ---         ┆ ---  ┆ --- ┆ ---        │
│ i64        ┆ str        ┆ str        ┆ u32     ┆ f64         ┆ f64  ┆ f64 ┆ f64        │
╞════════════╪════════════╪════════════╪═════════╪═════════════╪══════╪═════╪════════════╡
│ 683003     ┆ FF         ┆ 2025-03-13 ┆ 34      ┆ 95.8        ┆ -6.4 ┆ 6.1 ┆ 0.168      │
│ 683003     ┆ FF         ┆ 2025-03-07 ┆ 32      ┆ 96.7        ┆ -5.8 ┆ 6.0 ┆ 0.158      │
│ 683003     ┆ SL         ┆ 2025-03-07 ┆ 21      ┆ 90.0        ┆ -4.5 ┆ 2.9 ┆ 0.104      │
│ 683003     ┆ FF         ┆ 2025-03-02 ┆ 18      ┆ 97.3        ┆ -5.6 ┆ 5.8 ┆ 0.089      │
│ 683003     ┆ FF         ┆ 2025-02-25 ┆ 13      ┆ 96.9        ┆ -7.0 ┆ 5.9 ┆ 0.064      │
│ …          ┆ …          ┆ …          ┆ …   

In [20]:
df_player = scraper.get_players(sport_id=1,season=2024,game_type=['R'])
name = 'Ryan Weathers'  # Replace with the name you're looking for
specific_player = df_player.filter(df_player['name'] == name)
print(specific_player)

shape: (1, 10)
┌───────────┬────────────┬───────────┬───────────────┬───┬────────┬────────┬─────┬────────────┐
│ player_id ┆ first_name ┆ last_name ┆ name          ┆ … ┆ weight ┆ height ┆ age ┆ birthDate  │
│ ---       ┆ ---        ┆ ---       ┆ ---           ┆   ┆ ---    ┆ ---    ┆ --- ┆ ---        │
│ i64       ┆ str        ┆ str       ┆ str           ┆   ┆ i64    ┆ str    ┆ i64 ┆ str        │
╞═══════════╪════════════╪═══════════╪═══════════════╪═══╪════════╪════════╪═════╪════════════╡
│ 677960    ┆ Ryan       ┆ Weathers  ┆ Ryan Weathers ┆ … ┆ 230    ┆ 6' 1"  ┆ 25  ┆ 1999-12-17 │
└───────────┴────────────┴───────────┴───────────────┴───┴────────┴────────┴─────┴────────────┘


In [21]:
import polars as pl

# Set player ID and the desired date
player_id = 683003
selected_date = '2024-03-30'  # Replace with your desired date (format: 'YYYY-MM-DD')
season = 2024
player_games = scraper.get_player_games_list(player_id=player_id, season=season, game_type=['R'], pitching=True)

# Get the data using the game list
data = scraper.get_data(game_list_input=player_games)

# Convert the data into a DataFrame
df = scraper.get_data_df(data_list=data)
# Perform the filtering, grouping, and aggregation
grouped_df = (
    df.filter(
        (pl.col("pitcher_id") == player_id) & 
        (pl.col("game_date") == selected_date)  # Filter for the specific date
    )
    .group_by(['pitcher_id', 'pitch_type'])
    .agg([
        pl.col('is_pitch').drop_nans().count().alias('pitches'),
        pl.col('start_speed').drop_nans().mean().round(1).alias('start_speed'),
        pl.col('ivb').drop_nans().mean().round(1).alias('ivb'),
        pl.col('hb').drop_nans().mean().round(1).alias('hb'),
        pl.col('spin_rate').drop_nans().mean().round(0).alias('spin_rate'),
        pl.col('extension').drop_nans().mean().round(1).alias('extension'),
    ])
    .with_columns(
        (pl.col('pitches') / pl.col('pitches').sum().over('pitcher_id')).round(3).alias('proportion')
    )
).sort('proportion', descending=True)

# Display the grouped DataFrame
print(grouped_df)


This May Take a While. Progress Bar shows Completion of Data Retrieval.


Processing: 100%|██████████| 22/22 [00:01<00:00, 17.64iteration/s]

Converting Data to Dataframe.
shape: (4, 9)
┌────────────┬────────────┬─────────┬─────────────┬───┬──────┬───────────┬───────────┬────────────┐
│ pitcher_id ┆ pitch_type ┆ pitches ┆ start_speed ┆ … ┆ hb   ┆ spin_rate ┆ extension ┆ proportion │
│ ---        ┆ ---        ┆ ---     ┆ ---         ┆   ┆ ---  ┆ ---       ┆ ---       ┆ ---        │
│ i64        ┆ str        ┆ u32     ┆ f64         ┆   ┆ f64  ┆ f64       ┆ f64       ┆ f64        │
╞════════════╪════════════╪═════════╪═════════════╪═══╪══════╪═══════════╪═══════════╪════════════╡
│ 683003     ┆ SL         ┆ 40      ┆ 87.9        ┆ … ┆ -3.4 ┆ 2573.0    ┆ 7.5       ┆ 0.449      │
│ 683003     ┆ FF         ┆ 38      ┆ 97.1        ┆ … ┆ 8.8  ┆ 2562.0    ┆ 7.6       ┆ 0.427      │
│ 683003     ┆ CU         ┆ 6       ┆ 79.9        ┆ … ┆ -5.3 ┆ 2617.0    ┆ 7.3       ┆ 0.067      │
│ 683003     ┆ CH         ┆ 5       ┆ 91.0        ┆ … ┆ 12.8 ┆ 1736.0    ┆ 7.6       ┆ 0.056      │
└────────────┴────────────┴─────────┴─────────────┴───┴─




In [22]:
import polars as pl

# Set player ID and the desired date
player_id = 677960
selected_date = '2024-03-30'  # Replace with your desired date (format: 'YYYY-MM-DD')
season = 2024
player_games = scraper.get_player_games_list(player_id=player_id, season=season, game_type=['R'], pitching=True)

# Get the data using the game list
data = scraper.get_data(game_list_input=player_games)

# Convert the data into a DataFrame
df = scraper.get_data_df(data_list=data)
# Perform the filtering, grouping, and aggregation
table_df = (
    df.filter(
        (pl.col("pitcher_id") == player_id) & 
        (pl.col("game_date") == selected_date))
    .group_by(['pitcher_id', 'pitch_type'])
    .agg([
        pl.col('is_pitch').drop_nans().count().alias('pitches'),
        pl.col('ivb').drop_nans().mean().round(1).alias('ivb'),
        pl.col('hb').drop_nans().mean().round(1).alias('hb'),
        pl.col('z0').drop_nans().mean().round(1).alias('vRel'),
        pl.col('x0').drop_nans().mean().round(1).alias('hRel'),
        pl.col('is_whiff').mean().round(2).alias('is_whiff'),
        
    ])
    .with_columns(
        (pl.col('pitches') / pl.col('pitches').sum().over('pitcher_id')).round(3).alias('proportion')
    )
    ).sort('proportion', descending=True)
print(table_df)

This May Take a While. Progress Bar shows Completion of Data Retrieval.


Processing: 100%|██████████| 16/16 [00:00<00:00, 16.91iteration/s]

Converting Data to Dataframe.
shape: (5, 9)
┌────────────┬────────────┬─────────┬──────┬───┬──────┬──────┬──────────┬────────────┐
│ pitcher_id ┆ pitch_type ┆ pitches ┆ ivb  ┆ … ┆ vRel ┆ hRel ┆ is_whiff ┆ proportion │
│ ---        ┆ ---        ┆ ---     ┆ ---  ┆   ┆ ---  ┆ ---  ┆ ---      ┆ ---        │
│ i64        ┆ str        ┆ u32     ┆ f64  ┆   ┆ f64  ┆ f64  ┆ f64      ┆ f64        │
╞════════════╪════════════╪═════════╪══════╪═══╪══════╪══════╪══════════╪════════════╡
│ 677960     ┆ FF         ┆ 42      ┆ 16.8 ┆ … ┆ 5.6  ┆ 2.3  ┆ 1.0      ┆ 0.447      │
│ 677960     ┆ CH         ┆ 34      ┆ 5.8  ┆ … ┆ 5.5  ┆ 2.5  ┆ 1.0      ┆ 0.362      │
│ 677960     ┆ ST         ┆ 12      ┆ 1.3  ┆ … ┆ 5.6  ┆ 2.6  ┆ 1.0      ┆ 0.128      │
│ 677960     ┆ SI         ┆ 5       ┆ 12.1 ┆ … ┆ 5.5  ┆ 2.5  ┆ null     ┆ 0.053      │
│ 677960     ┆ SL         ┆ 1       ┆ 4.1  ┆ … ┆ 5.6  ┆ 2.4  ┆ null     ┆ 0.011      │
└────────────┴────────────┴─────────┴──────┴───┴──────┴──────┴──────────┴────────────┘


