# Descarga de datos de pitcheo MLB (2015-2024)

Se realiza un algortimo y las peticiones usando pybaseball para guardar los datos de pitcheo del 2015 hasta 2024 sin procesar como CSV de la fuente statcast.

---

## Habilitar caché

In [1]:
exec(open('../src/csv_manager.py').read())
from pybaseball import cache

cache.enable() # Habilita el caché de pybaseball

---

## Función base de descarga

In [2]:
from pybaseball import statcast
import time

def download_statcast_data(year):
    try:
        # Definir rango de fechas para el año completo
        start_date = str(year) + '-01-01'
        end_date = str(year) + '-12-31' 
        
        print(f"\n{'='*60}")
        print(f"Descargando datos de {start_date} a {end_date}...")
        print(f"{'='*60}")
        
        inicio = time.time()
        
        df = statcast(start_dt=start_date, end_dt=end_date, parallel=True)
        
        tiempo_descarga = time.time() - inicio
        
        folder_name = f"statcast_raw_data"
        create_csv(df, folder_name, year)

        filas, columnas = df.shape
        print(f"Datos descargados: {filas:,} filas y {columnas} columnas")
        print(f"Tiempo de descarga: {tiempo_descarga:.2f} segundos")
        
        print(f"\nPrimeras 3 filas:")
        print(df.head(3))
        
    except Exception as e:
        print(f"Error descargando datos para {year}: {e}")
        print("Los datos parciales deberían estar guardados en caché")

---

## Descarga por año

In [3]:
download_statcast_data(2015)


Descargando datos de 2015-01-01 a 2015-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 211/211 [01:19<00:00,  2.66it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2015.csv
Archivo creado con 712839 filas y 118 columnas
Datos descargados: 712,839 filas y 118 columnas
Tiempo de descarga: 95.48 segundos

Primeras 3 filas:
    pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
192         FF 2015-11-01           96.1          -2.02           6.25   
200         FC 2015-11-01           93.1          -1.66           6.24   
206         FF 2015-11-01           97.0          -1.64            6.3   

     player_name  batter  pitcher     events    description  ...  \
192  Davis, Wade  527038   451584  strikeout  called_strike  ...   
200  Davis, Wade  527038   451584       None           foul  ...   
206  Davis, Wade  527038   451584       None           foul  ...   

     batter_days_until_next_game  api_break_z_with_gravity  api_break_x_arm  \
192                         <NA>                       1.1              0.1   
200                         <NA>                      2.21 

In [4]:
download_statcast_data(2016)


Descargando datos de 2016-01-01 a 2016-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 214/214 [01:15<00:00,  2.84it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2016.csv
Archivo creado con 726273 filas y 118 columnas
Datos descargados: 726,273 filas y 118 columnas
Tiempo de descarga: 87.61 segundos

Primeras 3 filas:
    pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
325         CU 2016-11-02           76.6           2.93           6.59   
335         CU 2016-11-02           74.3           3.04           6.51   
203         FF 2016-11-02           94.9          -1.29           6.46   

           player_name  batter  pitcher     events    description  ...  \
325   Montgomery, Mike  492841   543557  field_out  hit_into_play  ...   
335   Montgomery, Mike  492841   543557       None  called_strike  ...   
203  Edwards Jr., Carl  434658   605218     single  hit_into_play  ...   

     batter_days_until_next_game  api_break_z_with_gravity  api_break_x_arm  \
325                         <NA>                      5.26            -0.78   
335                         <NA>   

In [5]:
download_statcast_data(2017)


Descargando datos de 2017-01-01 a 2017-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 214/214 [01:18<00:00,  2.72it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2017.csv
Archivo creado con 732477 filas y 118 columnas
Datos descargados: 732,477 filas y 118 columnas
Tiempo de descarga: 89.45 segundos

Primeras 3 filas:
  pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
0         FF 2017-11-01           96.6          -1.85           5.93   
1         FF 2017-11-01           95.0          -1.99           5.86   
2         FF 2017-11-01           96.1          -2.06           5.92   

       player_name  batter  pitcher     events      description  ...  \
0  Morton, Charlie  608369   450203  field_out    hit_into_play  ...   
1  Morton, Charlie  621035   450203  field_out    hit_into_play  ...   
2  Morton, Charlie  621035   450203       None  swinging_strike  ...   

   batter_days_until_next_game  api_break_z_with_gravity  api_break_x_arm  \
0                         <NA>                      1.54             0.93   
1                         <NA>                       1.

In [6]:
download_statcast_data(2018)


Descargando datos de 2018-01-01 a 2018-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 214/214 [01:18<00:00,  2.72it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2018.csv
Archivo creado con 731207 filas y 118 columnas
Datos descargados: 731,207 filas y 118 columnas
Tiempo de descarga: 88.30 segundos

Primeras 3 filas:
    pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
154         SL 2018-10-28           84.0           3.05           5.26   
159         FF 2018-10-28           95.3           3.17            5.5   
165         FF 2018-10-28           96.4           3.07           5.54   

     player_name  batter  pitcher     events      description  ...  \
154  Sale, Chris  592518   519242  strikeout  swinging_strike  ...   
159  Sale, Chris  592518   519242       None             ball  ...   
165  Sale, Chris  592518   519242       None             foul  ...   

     batter_days_until_next_game  api_break_z_with_gravity  api_break_x_arm  \
154                         <NA>                      3.59            -1.35   
159                         <NA>                   

In [7]:
download_statcast_data(2019)


Descargando datos de 2019-01-01 a 2019-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 225/225 [01:26<00:00,  2.60it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2019.csv
Archivo creado con 760498 filas y 118 columnas
Datos descargados: 760,498 filas y 118 columnas
Tiempo de descarga: 97.68 segundos

Primeras 3 filas:
    pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
218         SL 2019-10-30           87.9          -2.65            5.5   
227         FF 2019-10-30           95.9          -2.77           5.52   
231         FF 2019-10-30           96.5          -2.68           5.42   

        player_name  batter  pitcher     events      description  ...  \
218  Hudson, Daniel  488726   543339  strikeout  swinging_strike  ...   
227  Hudson, Daniel  488726   543339       None             foul  ...   
231  Hudson, Daniel  488726   543339       None             ball  ...   

     batter_days_until_next_game  api_break_z_with_gravity  api_break_x_arm  \
218                         <NA>                      2.75            -0.02   
227                         <NA>       

In [8]:
download_statcast_data(2020)


Descargando datos de 2020-01-01 a 2020-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 97/97 [00:39<00:00,  2.44it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2020.csv
Archivo creado con 279660 filas y 118 columnas
Datos descargados: 279,660 filas y 118 columnas
Tiempo de descarga: 43.51 segundos

Primeras 3 filas:
   pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
75         FF 2020-10-27           96.7           1.58           5.99   
80         FF 2020-10-27           94.1           2.91           5.45   
85         FF 2020-10-27           94.9           1.77           6.02   

     player_name  batter  pitcher     events      description  ...  \
75  Urías, Julio  642715   628711  strikeout    called_strike  ...   
80  Urías, Julio  642715   628711       None    called_strike  ...   
85  Urías, Julio  642715   628711       None  swinging_strike  ...   

    batter_days_until_next_game  api_break_z_with_gravity  api_break_x_arm  \
75                         <NA>                      0.82             0.18   
80                         <NA>                      1.28

In [9]:
download_statcast_data(2021)


Descargando datos de 2021-01-01 a 2021-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 246/246 [01:41<00:00,  2.43it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2021.csv
Archivo creado con 763191 filas y 118 columnas
Datos descargados: 763,191 filas y 118 columnas
Tiempo de descarga: 117.83 segundos

Primeras 3 filas:
    pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
109         FF 2021-11-02           93.7           1.39           6.72   
118         FF 2021-11-02           92.9           1.38           6.72   
122         FF 2021-11-02           93.1           1.35           6.73   

     player_name  batter  pitcher     events    description  ...  \
109  Smith, Will  493329   519293  field_out  hit_into_play  ...   
118  Smith, Will  493329   519293       None           foul  ...   
122  Smith, Will  493329   519293       None  called_strike  ...   

     batter_days_until_next_game  api_break_z_with_gravity  api_break_x_arm  \
109                         <NA>                      1.39             0.57   
118                         <NA>                      1.32

In [10]:
download_statcast_data(2022)


Descargando datos de 2022-01-01 a 2022-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 246/246 [01:35<00:00,  2.56it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2022.csv
Archivo creado con 773618 filas y 118 columnas
Datos descargados: 773,618 filas y 118 columnas
Tiempo de descarga: 112.90 segundos

Primeras 3 filas:
    pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
195         SL 2022-11-05           89.2          -0.06           6.14   
202         FF 2022-11-05           93.9          -0.18           5.94   
212         FF 2022-11-05           93.0          -0.09           5.97   

       player_name  batter  pitcher     events    description  ...  \
195  Pressly, Ryan  592206   519151  field_out  hit_into_play  ...   
202  Pressly, Ryan  547180   519151  field_out  hit_into_play  ...   
212  Pressly, Ryan  592663   519151     single  hit_into_play  ...   

     batter_days_until_next_game  api_break_z_with_gravity  api_break_x_arm  \
195                         <NA>                      2.47            -0.41   
202                         <NA>                  

In [11]:
download_statcast_data(2023)


Descargando datos de 2023-01-01 a 2023-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 246/246 [01:35<00:00,  2.58it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2023.csv
Archivo creado con 771057 filas y 118 columnas
Datos descargados: 771,057 filas y 118 columnas
Tiempo de descarga: 107.29 segundos

Primeras 3 filas:
    pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
119         CU 2023-11-01           84.9          -1.19           6.12   
124         FF 2023-11-01           96.6          -0.69           6.24   
128         CU 2023-11-01           84.5          -1.27           6.11   

     player_name  batter  pitcher     events    description  ...  \
119  Sborz, Josh  606466   622250  strikeout  called_strike  ...   
124  Sborz, Josh  606466   622250       None           ball  ...   
128  Sborz, Josh  606466   622250       None  called_strike  ...   

     batter_days_until_next_game  api_break_z_with_gravity  api_break_x_arm  \
119                         <NA>                      4.08            -1.14   
124                         <NA>                      1.01

In [12]:
download_statcast_data(2024)


Descargando datos de 2024-01-01 a 2024-12-31...
This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|██████████| 246/246 [01:32<00:00,  2.66it/s]
  final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)


Datos guardados en: ../data/statcast_raw_data/statcast_2024.csv
Archivo creado con 757714 filas y 118 columnas
Datos descargados: 757,714 filas y 118 columnas
Tiempo de descarga: 103.80 segundos

Primeras 3 filas:
    pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \
160         KC 2024-10-30           77.5          -1.11           5.65   
171         KC 2024-10-30           78.7          -1.01           5.73   
179         FC 2024-10-30           93.1          -1.19           5.53   

         player_name  batter  pitcher     events              description  \
160  Buehler, Walker  657077   621111  strikeout  swinging_strike_blocked   
171  Buehler, Walker  657077   621111       None          swinging_strike   
179  Buehler, Walker  657077   621111       None          swinging_strike   

     ...  batter_days_until_next_game  api_break_z_with_gravity  \
160  ...                         <NA>                      5.23   
171  ...                         <NA>         