# Processing mixed beverage data

## October 2017

This Jupyter Notebook uses data.texas.gov [Mixed Beverage Gross Receipts data](https://data.texas.gov/Government-and-Taxes/Mixed-Beverage-Gross-Receipts/naix-2893), and then a python library called [agate](http://agate.readthedocs.io/) to clean and process that data for [stories similar to this one](http://www.mystatesman.com/business/austin-alcohol-sales-percent-february/Oo2txZUkuDlqBl0rU9O1lJ/) on monthly alcohol sales.

This is a work in progress.

- The first version will use a downloaded file to process. The data is in a new format, so it has to be reworked
- The second phase will work on pulling the data directly from Socrata.

In [1]:
# import libraries
import agate



In [2]:
# this surpresses the timezone warning
# Might comment out during development so other warnings
# are not surpressed
import warnings
warnings.filterwarnings('ignore')

Show downloaded files. The path is `../mixbev-files/YYYY`. There are also oder files in `../mixbev-files/old-format/`

In [3]:
ls ../mixbev-files/2017

Mixed_Beverage_Gross_Receipts_2017_09.csv
Mixed_Beverage_Gross_Receipts_2017_10.csv


## Import the file

We'll set some project variables, including the file name, and then import the data into agate.

In [4]:
# This is the source file, which should be donwloaded
file = '../mixbev-files/2017/Mixed_Beverage_Gross_Receipts_2017_10.csv'

In [5]:
#Helps us import some text fields that may be considered numbers in error.
specified_types = {
    'Taxpayer Number': agate.Text(),
    'Location Number': agate.Text(),
    'Taxpayer Zip': agate.Text(),
    'Location Zip': agate.Text(),
    'Location County': agate.Text(),
    'Taxpayer County': agate.Text()
}

# this imports the file specified above, along with the proper types
mixbev_raw = agate.Table.from_csv(file, column_types=specified_types)

# prints table fields so we an check thoes data types
print(mixbev_raw)

| column                     | data_type |
| -------------------------- | --------- |
| Taxpayer Number            | Text      |
| Taxpayer Name              | Text      |
| Taxpayer Address           | Text      |
| Taxpayer City              | Text      |
| Taxpayer State             | Text      |
| Taxpayer Zip               | Text      |
| Taxpayer County            | Text      |
| Location Number            | Text      |
| Location Name              | Text      |
| Location Address           | Text      |
| Location City              | Text      |
| Location State             | Text      |
| Location Zip               | Text      |
| Location County            | Text      |
| Inside/Outside City Limits | Boolean   |
| TABC Permit Number         | Text      |
| Responsibility Begin Date  | Date      |
| Responsibility End Date    | Date      |
| Obligation End Date        | Date      |
| Liquor Receipts            | Number    |
| Wine Receipts              | Number    |
| Beer Rece

## Create establishment column

We do this so we make sure we have single establishments instead of grouping trade names together from different addresses, like 'CHILI'S BAR & GRILL'.

In [6]:
# Concatenates the name and address
mixbev_establishment = mixbev_raw.compute([
    ('Establishment', agate.Formula(agate.Text(), lambda row: '%(Location Name)s (%(Location Address)s)' % row))
])

# Uncomment line below to print Establishment to check what is looks like
# mixbev_establishment.select('Establishment').limit(5).print_table(max_column_width=80)

## Import and merge counties lookup table

We do this to get county names. I got this list from the comptroller.

NOTE: Wisdom would suggest we join on the code column from counites, but the data.texas.gov data does not have the zero padding from those values, so I'm using the id column.

In [7]:
# importing countes.csv, ensuring that the 'code' column is text
counties = agate.Table.from_csv('../resource-files/counties.csv', column_types={
    'code': agate.Text(),
    'id': agate.Text()
})

# uncomment below to peek at the column names and an example
# print(counties)
# counties.limit(5).print_table()

# joines the counties table to the mixed bev cleaned data with establishments
mixbev_joined = mixbev_establishment.join(counties, 'Location County', 'id')

# uncomment below if you want to check that the merge was succesful 
# print(mixbev_joined)

In [8]:
# get just the columns we need and rename county
# THIS is the finished, cleaned mixbev table
mixbev = mixbev_joined.select([
    'Location Name',
    'Location Address',
    'Establishment',
    'Location City',
    'Location State',
    'Location Zip',
    'county',
    'Total Receipts',
    'Obligation End Date'
]).rename(column_names = {
    'Location Name' : 'Name',
    'Location Address' : 'Address',
    'Location City': 'City',
    'Location State': 'State',
    'Location Zip': 'Zip',
    'Total Receipts' : 'Receipts',
    'county': 'County',
    'Obligation End Date': 'Report date'
})

# peek at the column names
print(mixbev)

| column        | data_type |
| ------------- | --------- |
| Name          | Text      |
| Address       | Text      |
| Establishment | Text      |
| City          | Text      |
| State         | Text      |
| Zip           | Text      |
| County        | Text      |
| Receipts      | Number    |
| Report date   | Date      |



## Location sums function

Because we want to get the top sellers in a bunch of cities and couties, we create a function so we don't have to repeat the code. This function allows us to pass in a city or county name to filter the monthly receipts table and then sum the Tax and Receipts columns. The result can then be acted on to print or aggreggate. It is used later in the file.

In [9]:
# function to group sales by a specific location
# City or County passed in should be ALL CAPS
# Location_type can be 'City' or 'County'

def location_sum(location_type, location):
    # Filters the data to the specified city
    location_filtered = mixbev.where(lambda row: row[location_type].upper() == location)

    # groups the data based on Establishment and location
    location_grouped = location_filtered.group_by('Establishment').group_by(location_type)
    # computes the sales based on the grouping
    location_summary = location_grouped.aggregate([
        ('Receipts_sum', agate.Sum('Receipts'))
    ])
    
    # sorts the results by most sold
    location_summary_sorted = location_summary.order_by('Receipts_sum', reverse=True)
    # prints the top 10 results
    
    return(location_summary_sorted)

In [10]:
# double-checking I'm looking at one report data
mixbev_dates = mixbev.select('Report date').distinct('Report date')
mixbev_dates.print_table()
print('\nNumber of records in table: {}'.format(
        len(mixbev))
     )

| Report date |
| ----------- |
|  2017-10-31 |

Number of records in table: 14753


## Top sales statewide

Because we want to group our results by more than one field and perform more than one aggregation, we'll do this a little differently. We'll use group_by to create a grouped table, then perform aggregations on that new table to computer the Tax and Receipts columns.

In [11]:
# summing sales statewide for month

print('Total statewide sales for this month are: {}\n'.format(
    mixbev.aggregate(agate.Sum('Receipts')) # <<< I should format better
))

# groups the data based on Establishment and City
mixbev_grouped = mixbev.group_by('Establishment').group_by('County').group_by('City')

# computes the sales based on the grouping
state_summary = mixbev_grouped.aggregate([
    ('Sales_sum', agate.Sum('Receipts'))
])

# sorts the results by most sold. We could probalby chain it above if we wanted to.
state_summary_sorted = state_summary.order_by('Sales_sum', reverse=True)

# prints the top 10 results
state_summary_sorted.limit(10).print_table(max_column_width=40)


Total statewide sales for this month are: 565676285

| Establishment                            | County  | City        | Sales_sum |
| ---------------------------------------- | ------- | ----------- | --------- |
| RYAN SANDERS SPORTS SERVICES, LLC (92... | Travis  | DEL VALLE   | 2,904,593 |
| AT&T STADIUM (1 LEGENDS WAY)             | Tarrant | ARLINGTON   | 2,819,059 |
| GAYLORD TEXAN (1501 GAYLORD TRL)         | Tarrant | GRAPEVINE   | 1,807,271 |
| ARAMARK SPORTS & ENTERTAINMENT SERVIC... | Harris  | HOUSTON     | 1,617,452 |
| SALC, INC. (2201 N STEMMONS FWY FL 1)    | Dallas  | DALLAS      | 1,555,285 |
| HOSPITALITY INTERNATIONAL, INC. (2380... | Bexar   | SAN ANTONIO | 1,347,636 |
| ARAMARK SPORTS & ENTERTAINMENT SER OF... | Harris  | HOUSTON     | 1,210,323 |
| OMNI DALLAS CONVENTION CENTER (555 S ... | Dallas  | DALLAS      | 1,207,948 |
| LEVY RESTAURANTS (2500 VICTORY AVE)      | Dallas  | DALLAS      | 1,194,981 |
| WLS BEVERAGE CO (110 E 2ND ST)           | Travis  | A

## Austin sales and sums

With this, we refernce the location_sum function above, and pass the type of location (City) and the name of the city (AUSTIN). At the same time, we limit the result of that function to the first 10 records, and then print the results. We are basically stringing together a bunch of stuff at once.

In [35]:
# Austin total sales as s city
# This sums the grouped table, but it works

print('Total sales in Austin are: {}\n'.format(
    location_sum('City', 'AUSTIN').aggregate(agate.Sum('Receipts_sum'))
))

# uses the city_sum function to filter
austin = location_sum('City', 'AUSTIN')


# print the resulting table
print('Top sellers in Austin are:\n')
austin.limit(50).print_table(max_column_width=60, max_rows=None)

Total sales in Austin are: 64614616

Top sellers in Austin are:

| Establishment                                                | City   | Receipts_sum |
| ------------------------------------------------------------ | ------ | ------------ |
| WLS BEVERAGE CO (110 E 2ND ST)                               | AUSTIN |    1,178,366 |
| BLIND PIG PUB / PIG PEN (317 E 6TH ST)                       | AUSTIN |      613,631 |
| 400 BAR/CUCARACHA/CHUPACABRA/JACKALOPE/MOOSENUCKLE (400 E... | AUSTIN |      543,720 |
| W HOTEL AUSTIN (200 LAVACA ST)                               | AUSTIN |      471,173 |
| ROSE ROOM/ 77 DEGREES (11500 ROCK ROSE AVE)                  | AUSTIN |      464,844 |
| SALC. INC.(HILTON HOTEL) (500 E 4TH ST)                      | AUSTIN |      431,178 |
| THE PALAZIO (501 E BEN WHITE BLVD)                           | AUSTIN |      430,533 |
| STUBB'S BAR-B-Q (801 RED RIVER ST)                           | AUSTIN |      402,490 |
| THE DOGWOOD DOMAIN (11420 ROCK ROSE AVE STE

## More Central Texas cities

In [13]:
location_sum('City', 'BASTROP').limit(5).print_table(max_column_width=60)

| Establishment                                                | City    | Receipts_sum |
| ------------------------------------------------------------ | ------- | ------------ |
| OLD TOWN RESTURANT AND BAR/PINEY CREEK CHOP HOUSE (931 MA... | BASTROP |       68,015 |
| CHILI'S GRILL & BAR (734 HIGHWAY 71 W)                       | BASTROP |       51,414 |
| NEIGHBOR'S (601 CHESTNUT ST UNIT C)                          | BASTROP |       39,112 |
| BACK 9 (834 HIGHWAY 71 W)                                    | BASTROP |       35,749 |
| THE 602 ON MAIN (919 MAIN ST)                                | BASTROP |       27,938 |


In [14]:
location_sum('City', 'BEE CAVE').limit(3).print_table(max_column_width=60)

| Establishment                                         | City     | Receipts_sum |
| ----------------------------------------------------- | -------- | ------------ |
| WOODY TAVERN AND GRILL, INC. (12801 SHOPS PKWY # 100) | BEE CAVE |      119,842 |
| HCG BEVERAGE, LLC (12525 BEE CAVE PKWY)               | BEE CAVE |       89,746 |
| MAUDIE'S HILL COUNTRY, LLC (12506 SHOPS PKWY)         | BEE CAVE |       80,255 |


In [15]:
location_sum('City', 'BUDA').limit(3).print_table(max_column_width=60)

| Establishment                                               | City | Receipts_sum |
| ----------------------------------------------------------- | ---- | ------------ |
| WILLIE'S JOINT (824 MAIN ST)                                | BUDA |       92,781 |
| BUCKS BACKYARD (1750 S FM 1626)                             | BUDA |       55,582 |
| NATE'S AT BUDA MILL AND GRAIN, INC. (306 S MAIN ST STE 101) | BUDA |       52,176 |


In [16]:
location_sum('City', 'DEL VALLE').limit(3).print_table(max_column_width=60)

| Establishment                                                | City      | Receipts_sum |
| ------------------------------------------------------------ | --------- | ------------ |
| RYAN SANDERS SPORTS SERVICES, LLC (9201 CIRCUIT OF THE AM... | DEL VALLE |    2,904,593 |
| CLUB MIS AMIGOS (13820 FM 812)                               | DEL VALLE |        9,498 |
| RESTAURANTES JASMINES (2463 HIGWAY 71 E)                     | DEL VALLE |        4,527 |


In [17]:
location_sum('City', 'CEDAR PARK').limit(3).print_table(max_column_width=60)

| Establishment                                                | City       | Receipts_sum |
| ------------------------------------------------------------ | ---------- | ------------ |
| THE GROVE WINE BAR AND KITCHEN (1310 E WHITESTONE BLVD # ... | CEDAR PARK |      146,565 |
| CHUY'S (4911 183A TOLL RD)                                   | CEDAR PARK |      111,575 |
| PLUCKERS WING BAR (11066 PECAN PARK BLVD STE G)              | CEDAR PARK |      100,934 |


In [18]:
location_sum('City', 'DRIPPING SPRINGS').limit(3).print_table(max_column_width=60)

| Establishment                                             | City             | Receipts_sum |
| --------------------------------------------------------- | ---------------- | ------------ |
| TREATY OAK BREWING AND DISTILLING CO. (16604 FITZHUGH RD) | DRIPPING SPRINGS |       67,637 |
| TRUDY'S FOUR STAR (13059 FOUR STAR BLVD)                  | DRIPPING SPRINGS |       65,010 |
| FLORES MEXICAN RESTAURANT (2440 E HIGHWAY 290 BLDG D)     | DRIPPING SPRINGS |       57,566 |


In [19]:
location_sum('City', 'GEORGETOWN').limit(3).print_table(max_column_width=60)

| Establishment                              | City       | Receipts_sum |
| ------------------------------------------ | ---------- | ------------ |
| EL MONUMENTO (205 W 2ND ST)                | GEORGETOWN |       82,834 |
| GATEWAY BEVERAGES, INC (1101 WOODLAWN AVE) | GEORGETOWN |       73,446 |
| DOS SALSAS CAFE INC (1104 S MAIN ST)       | GEORGETOWN |       66,860 |


In [20]:
location_sum('City', 'KYLE').limit(3).print_table(max_column_width=60)

| Establishment                                           | City | Receipts_sum |
| ------------------------------------------------------- | ---- | ------------ |
| CASA GARCIA'S MEXICAN RESTAURANT (5401 FM 1626 STE 300) | KYLE |       68,336 |
| CENTERFIELD SPORTS BAR & GRILL (200 W CENTER ST)        | KYLE |       41,290 |
| EVO ENTERTAINMENT CENTER (3200 KYLE XING)               | KYLE |       38,776 |


In [21]:
location_sum('City', 'LAGO VISTA').limit(3).print_table(max_column_width=60)

| Establishment                                         | City       | Receipts_sum |
| ----------------------------------------------------- | ---------- | ------------ |
| THE GNARLY GAR (18200 LAKEPOINT CV)                   | LAGO VISTA |       45,615 |
| THE GRILLE AT HIGHLAND LAKES (20552 HIGHLAND LAKE DR) | LAGO VISTA |       18,417 |
| COPPERHEAD GRILL (6115 LOHMANS FORD RD)               | LAGO VISTA |       18,136 |


In [22]:
location_sum('City', 'LAKEWAY').limit(3).print_table(max_column_width=60)

| Establishment                                          | City    | Receipts_sum |
| ------------------------------------------------------ | ------- | ------------ |
| LAKEWAY RESORT AND SPA (101 LAKEWAY DR)                | LAKEWAY |      122,046 |
| THE GROVE WINE BAR AND KITCHEN (3001 RANCH ROAD 620 S) | LAKEWAY |      114,445 |
| THE HILLS AT LAKEWAY (26 CLUB ESTATES PKWY)            | LAKEWAY |      105,234 |


In [23]:
location_sum('City', 'LEANDER').limit(3).print_table(max_column_width=60)

| Establishment                                          | City    | Receipts_sum |
| ------------------------------------------------------ | ------- | ------------ |
| BROOKLYN HEIGHTS PIZZERIA (3550 LAKELINE BLVD STE 135) | LEANDER |       66,322 |
| JARDIN DEL REY (703 S HIGHWAY 183)                     | LEANDER |       37,707 |
| TAPATIA JALISCO #3 LLC (651 N US 183)                  | LEANDER |       12,443 |


In [24]:
location_sum('City', 'LIBERTY HILL').limit(3).print_table(max_column_width=60)

| Establishment                                        | City         | Receipts_sum |
| ---------------------------------------------------- | ------------ | ------------ |
| JARDIN CORONA (15395 W STATE HIGHWAY 29)             | LIBERTY HILL |       39,688 |
| MARGARITA'S RESTAURANT (10280 W STATE HIGHWAY 29)    | LIBERTY HILL |       34,860 |
| ELENAS MEXICAN RESTAURANT (14801 W STATE HIGHWAY 29) | LIBERTY HILL |        6,084 |


In [25]:
location_sum('City', 'PFLUGERVILLE').limit(3).print_table(max_column_width=60)

| Establishment                                  | City         | Receipts_sum |
| ---------------------------------------------- | ------------ | ------------ |
| MAVERICKS (1700 GRAND AVENUE PKWY STE 240)     | PFLUGERVILLE |      118,474 |
| HANOVER'S DRAUGHT HAUS (108 E MAIN ST)         | PFLUGERVILLE |       72,874 |
| RIO GRANDE TEX MEX (15821 CENTRAL COMMERCE DR) | PFLUGERVILLE |       54,604 |


In [26]:
location_sum('City', 'ROUND ROCK').limit(5).print_table(max_column_width=60)

| Establishment                                             | City       | Receipts_sum |
| --------------------------------------------------------- | ---------- | ------------ |
| THIRD BASE ROUND ROCK, LLC (3107 S INTERSTATE 35 STE 810) | ROUND ROCK |      154,984 |
| TWIN PEAKS RESTAURANT (100 LOUIS HENNA BLVD)              | ROUND ROCK |      149,813 |
| RICK'S CABARET (3105 S INTERSTATE 35)                     | ROUND ROCK |      138,443 |
| JACK ALLEN'S KITCHEN (2500 HOPPE TRL)                     | ROUND ROCK |      131,938 |
| PLUCKERS WING BAR (300 MAYS XING STE 300)                 | ROUND ROCK |      117,160 |


In [27]:
location_sum('City', 'SAN MARCOS').limit(5).print_table(max_column_width=60)

| Establishment                                              | City       | Receipts_sum |
| ---------------------------------------------------------- | ---------- | ------------ |
| PLUCKERS WING BAR (105 N INTERSTATE 35)                    | SAN MARCOS |      121,763 |
| THE MARC (120 E SAN ANTONIO ST)                            | SAN MARCOS |      120,155 |
| ZELICKS (336 W HOPKINS ST)                                 | SAN MARCOS |      110,193 |
| 54TH STREET RESTAURANT & DRAFTHOUSE (1303 S INTERSTATE 35) | SAN MARCOS |       92,174 |
| CHIMY'S SAN MARCOS (217 E HOPKINS ST)                      | SAN MARCOS |       91,857 |


In [28]:
location_sum('City', 'SPICEWOOD').limit(3).print_table(max_column_width=60)

| Establishment                                        | City      | Receipts_sum |
| ---------------------------------------------------- | --------- | ------------ |
| ANGEL'S ICEHOUSE (21815 W HWY 71)                    | SPICEWOOD |       58,696 |
| POODIES HILLTOP ROADHOUSE (22308 STATE HIGHWAY 71 W) | SPICEWOOD |       44,614 |
| J5 STEAKHOUSE (21814 STATE HIGHWAY 71 W)             | SPICEWOOD |       22,367 |


In [29]:
location_sum('City', 'SUNSET VALLEY').limit(3).print_table(max_column_width=60)

| Establishment                                          | City          | Receipts_sum |
| ------------------------------------------------------ | ------------- | ------------ |
| DOC'S BACKYARD (5207 BRODIE LN # 100)                  | SUNSET VALLEY |       89,901 |
| BJ'S RESTAURANT AND BREWHOUSE (5207 BRODIE LN STE 300) | SUNSET VALLEY |       64,383 |
| CANE ROSSO (4715 S LAMAR BLVD STE 103)                 | SUNSET VALLEY |       30,800 |


In [30]:
location_sum('City', 'WEST LAKE HILLS').limit(3).print_table(max_column_width=60)

| Establishment                                                | City            | Receipts_sum |
| ------------------------------------------------------------ | --------------- | ------------ |
| LUPE TORTILLA MEXICAN RESTAURANT (701 S CAPITAL OF TEXAS ... | WEST LAKE HILLS |       90,865 |
| CHIPOTLE CHIPOTLE MEXICAN GRILL CHIPOTLE BURRITOS (3300 B... | WEST LAKE HILLS |          743 |


## Sales by county
In this case, we pass in the location type of 'County' and then a county name in caps to get the most sales in a particular county.

In [31]:
# MOVE TO SEPARATE CELLS SO THEY DON'T FORCE A SCROLL
location_sum('County', 'BASTROP').limit(5).print_table(max_column_width=80)
print('\n')
location_sum('County', 'CALDWELL').limit(5).print_table(max_column_width=80)
print('\n')
location_sum('County', 'HAYS').limit(5).print_table(max_column_width=80)
print('\n')
location_sum('County', 'TRAVIS').limit(5).print_table(max_column_width=80)
print('\n')
location_sum('County', 'WILLIAMSON').limit(5).print_table(max_column_width=80)

| Establishment                                                   | County  | Receipts_sum |
| --------------------------------------------------------------- | ------- | ------------ |
| LOST PINES BEVERAGE LLC (575 HYATT LOST PINES ROAD)             | Bastrop |      572,969 |
| OLD TOWN RESTURANT AND BAR/PINEY CREEK CHOP HOUSE (931 MAIN ST) | Bastrop |       68,015 |
| CHILI'S GRILL & BAR (734 HIGHWAY 71 W)                          | Bastrop |       51,414 |
| NEIGHBOR'S (601 CHESTNUT ST UNIT C)                             | Bastrop |       39,112 |
| BACK 9 (834 HIGHWAY 71 W)                                       | Bastrop |       35,749 |


| Establishment                                               | County   | Receipts_sum |
| ----------------------------------------------------------- | -------- | ------------ |
| EL REY BAR/NIGHTCLUB (1721 S COMMERCE ST)                   | Caldwell |       32,103 |
| GUADALAJARA MEXICAN RESTAURANT (1710 S COLORADO ST STE 110) | Caldwell |   

## Sales by ZIP Code
A list of sales by ZIP Code. If anything other than 78701 is at the top, it is news.

In [32]:
# top zip code gross receipts
zip_receipts = mixbev.pivot('Zip', aggregation=agate.Sum('Receipts')).order_by('Sum', reverse=True)
zip_receipts.limit(5).print_table()

| Zip   |        Sum |
| ----- | ---------- |
| 78701 | 25,753,187 |
| 75201 | 12,633,478 |
| 77002 | 11,679,653 |
| 78205 | 10,630,278 |
| 77006 |  7,490,768 |
