This notebook was designed to work with [Google Colab](https://colab.research.google.com/github/lokdoesdata/syracuse-assorted/blob/main/ist_652/lab_1/lok_ngan_lab_problem_1.ipynb).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lokdoesdata/syracuse-assorted/blob/main/ist_652/lab_1/lok_ngan_lab_problem_1.ipynb)

# IST 652 - Lab 1

Lok Ngan

Due: April 20, 2021

----------

For the NBAfile.py program, for each line, create a string using string formatting that puts the team, attendance, and ticket prices into a formatted string. Each line should look something like:  

‘The attendance at Atlanta was 13993 and the ticket price was $20.06’  

Your program should then print these strings instead of the lines. **Submit your code and the output of your program.** Submit assignment as a .txt, .py, .pdf, or jupyter notebook file.  

## Set Up

For this lab assignment, there are many different ways to achieve the result.  `Pandas` was chosen as it was deemed to be the most straightforward.  `Pathlib` will be used path control.

In [1]:
# I/O
from pathlib import Path

# Data
import pandas as pd

## Data

The *NBA-Attendance-1989.txt* file was stored in the subdirectory data under the current work directory.

In [2]:
DATA_PATH = r'https://raw.githubusercontent.com/lokdoesdata/syracuse-assorted/main/ist_652/lab_1/data/nba-attendance-1989.txt'

OUTPUT_PATH = Path.cwd().joinpath('output')
OUTPUT_PATH.mkdir(parents=True, exist_ok=True) 

The data was inspected and determined to be tab separated.  There are also instances when multiple tabs were used.  Setting `sep` to '\t+' overcomes this.

In [3]:
df = pd.read_csv(
    DATA_PATH,
    header=None,
    names=['Team', 'Attendance', 'Ticket Price'],
    sep='\t+',
    engine='python')

## Helper

In [4]:
def generate_string(team: str, attendence: int, ticket_price: float) -> str:
    """Generate a string summarizing the attendance and ticket price for a NBA team.

    Parameters
    ----------
    team: str, name of the team
    attendance: nonnegative int, attendance of the team
    ticket_price: nonnegative float, ticket price of the team

    Return
    ---------
    A string in the format of:
    'The attendance at {team} was {attendance} and the ticket price was ${ticket_price}.'

    Example
    ---------
    >>> generate_string('Atlanta', 13993, 20.66)
    The attendance at Atlanta was 13993 and the ticket price was $20.06.
    """

    if attendence < 0 or ticket_price < 0:
        raise Exception('Attendance and Ticket Price cannot be negative numbers!')

    return(f'The attendance at {team} was {attendence} and the ticket price was ${ticket_price}.')

## Apply the Helper Function

Apply the function on the slice of the DataFrame that included Team, Attendance, and Ticket Price columns.  This is to ensure repeatability of the script, as `generate_string` was written to accept three arguments, and the fourth column would cause an error once it is added.

In [5]:
df['Output'] = df[['Team', 'Attendance', 'Ticket Price']].apply(lambda x: generate_string(*x), axis=1)

## Output

In [6]:
for line in df['Output'].tolist():
    print(line)

The attendance at Atlanta was 13993 and the ticket price was $20.06.
The attendance at Boston was 14916 and the ticket price was $22.54.
The attendance at Charlotte was 23901 and the ticket price was $17.0.
The attendance at Chicago was 18404 and the ticket price was $21.98.
The attendance at Cleveland was 16969 and the ticket price was $19.63.
The attendance at Dallas was 16868 and the ticket price was $17.05.
The attendance at Denver was 12668 and the ticket price was $17.4.
The attendance at Detroit was 21454 and the ticket price was $24.42.
The attendance at Golden_State was 15025 and the ticket price was $17.04.
The attendance at Houston was 15846 and the ticket price was $17.56.
The attendance at Indiana was 12885 and the ticket price was $13.77.
The attendance at LA_Clippers was 11869 and the ticket price was $21.95.
The attendance at LA_Lakers was 17378 and the ticket price was $29.18.
The attendance at Miami was 15008 and the ticket price was $17.6.
The attendance at Milwaukee

In [7]:
df['Output'].to_csv(OUTPUT_PATH.joinpath('lok_ngan_lab_problem_1_output.csv'), index=False)

A copy of the output can be obtained [here](https://raw.githubusercontent.com/lokdoesdata/syracuse-assorted/main/ist_652/lab_1/output/lok_ngan_lab_problem_1_output.csv)