# Calvinball

This project calculates the records for each MLB team when playing by Calvinball rules: any extra inning game or seven-inning game. This project is based on the [pybaseball](https://pypi.org/project/pybaseball/) library.

## Imports

In order for this script to work, you need to import the `pandas` library, `schedule_and_record` from `pybaseball`, and the various configurations of teams. Teams are represented by a list of strings.

In [3]:
import pandas as pd
from pybaseball import schedule_and_record
from teams import mlb, al, nl, al_east, al_central, al_west, nl_east, nl_central, nl_west

## Functions

Here are the functions used for the script.

### `process_teams`

This function takes a list of teams. It then does the following things:
* Creates an empty data frame with headers for `Team`, `Wins`, `Losses`, and `Win %`.
* Iterate over each team in the list of teams.
* Fetch the record for the team and rename the `W/L` column to `Rec`. **Note:** I had trouble filter this data frame with a header of `W/L`. I think the `/` caused a problem. I renamed the column to avoid this issue. If anybody has an explanation, I would be interested.
* Create the variable `cb_games` which is a data frame that has just information for Calvinball games. See the explanation for the `filter_cb_games` function below.
* Create the variable `cb_wins` which is the number of Calvinball wins. See the explanation for the `count_wins` function below.
* Create the variable `cb_losses` which is the number of Calvinball losses. See the explanation for the `count_losses` function below.
* Create the variable `cb_total` which represents the total number of Calvinball games played. It is calculated by finding the length of the `cb_games` data frame.
* Create the `cb_win_percentage` variable which is the number of Calvinball wins divided by total Calvinball games.
* Create the `cb_info` variable wich is a dictionary. The keys in the dictionary correspond to the headers in the `cb_records` data frame.
* Finally, append the `cb_info` dictionary to the `cb_records` data frame. The information stored in the dictionary will be converted to a row in the data frame.

In [4]:
def process_teams(teams):
    cb_records = pd.DataFrame(columns=['Team', 'Wins', 'Losses', 'Win %'])
    for team in teams:
        data = schedule_and_record(2021, team).rename(columns={"W/L" : "Rec"})
        cb_games = filter_cb_games(data)
        cb_wins = count_wins(cb_games)
        cb_losses = count_losses(cb_games)
        cb_total = len(cb_games)
        cb_win_percentage = cb_wins / cb_total
        cb_info = {'Team' : team, 'Wins' : cb_wins, 'Losses' : cb_losses, 'Win %' : cb_win_percentage}
        cb_records = cb_records.append(cb_info, ignore_index=True)
    return cb_records

### `filter_cb_games`

This method takes a data frame for a team's record so far, and returns a data frame with game information if the innings played are greater than 9 or equal to 7.

**Note:** I suppose it is possible to have a game from a double header go into "extra" innings, which would escape this method. This method needs further work.

In [5]:
def filter_cb_games(game_data):
    cb_games = game_data.query('Inn > 9 or Inn == 7')
    return cb_games

### `count_wins`

This function takes a data frame that has the Calvinball information for a particular team. The `wins` variable is a data frame containing only data for when a team wins a Calvinball game. The function returns the length of the `wins` variable, which is the number of Calvinball wins.

In [6]:
def count_wins(game_data):
    wins = game_data.query('Rec == "W" or Rec == "W-wo"')
    return len(wins)


### `count_losses`

This function takes a data frame that has the Calvinball information for a particular team. The `losses` variable is a data frame containing only data for when a team loses a Calvinball game. The function returns the length of the `losses` variable, which is the number of Calvinball losses.

In [7]:
def count_losses(game_data):
    losses = game_data.query('Rec == "L" or Rec == "L-wo"')
    return len(losses)

### `cb_champs`

This function determines who has the best Calvinball record. The function takes a list of teams and returns a data frame. The `cb_data` variable is a data frame representing the team name, number of Calvinball wins, number of Calvinball losses, and the team's Calvinball winning percentage. The data frame is sorted (`.sort_value(by='Win %')`) in descending order (`ascending=False`) and it removes the index (`to_string(index=False)`).

In [8]:
def cb_champs(teams):
    cb_data = process_teams(teams)
    return cb_data.sort_values(by='Win %', ascending=False).to_string(index=False)

### `cb_losers`

This function determines who has the worst Calvinball record. The function takes a list of teams and returns a data frame. The `cb_data` variable is a data frame representing the team name, number of Calvinball wins, number of Calvinball losses, and the team's Calvinball winning percentage. The data frame is sorted in ascending order (`.sort_value(by='Win %')`) and it removes the index (`to_string(index=False)`).

In [9]:
def cb_losers(teams):
    cb_data = process_teams(teams)
    return cb_data.sort_values(by='Win %').to_string(index=False)

## Running the Script

To run the script, call either `cb_champs` or `cb_losers` and pass them one of the teams variables (`mlb`, `al`, `al_east`, `al_central`, `al_west`, `nl`, `nl_east`, `nl_central`, or `nl_west`).

In [None]:
print(cb_champs(mlb))