# Mixed beverage data - September 2017 receipts

- Month studied: September 2017
- File used: October 2017

This Jupyter Notebook analysis uses a [Mixed Beverage Gross Receipts](https://comptroller.texas.gov/taxes/mixed-beverage/receipts.php) file downloaded from the Texas Comptroller's [data center](https://comptroller.texas.gov/transparency/open-data/search-datasets/), and then a python library called [agate](http://agate.readthedocs.io/) to clean and process that data for [stories similar to this one](http://www.mystatesman.com/business/austin-alcohol-sales-percent-february/Oo2txZUkuDlqBl0rU9O1lJ/) on monthly alcohol sales.

This is a stripped down version (compared to the [original fork](https://github.com/utdata/mixbev-receipts) that skips detailed explanation of steps beyond commenting.

## Historical data

This notebook processes a single file. It is generally copied and then run on a new file. See the [notebooks folder in the repo](https://github.com/statesman/mixbev-receipts/tree/master/notebooks) for a list.

## Get to the goods

Most of the code at the top is set up and processing. If you want to skip to the results, search for:

- Top sales statewide
- Austin sales
- Central Texas cities

### File download

- Go to the [Texas Comptroller data center](https://comptroller.texas.gov/transparency/open-data/search-datasets/) and copy the url for the CSV for this month and enter it below.
- You also need to set to set the [processing variables](Processing-variables) for this month.

In [1]:
## %%bash
## downloads the mixedbev file into mixbev-files folder
## You have to set this URL based on location in data center

## these are commented out now. I've been pulling them manually
## and processing from those file
# cd ../test/
# curl -O https://comptroller.texas.gov/auto-data/odc/MIXEDBEV_03_2017.CSV

In [1]:
# imports the libraries we will use
import agate
from decimal import Decimal
import re

# this surpresses the timezone warning
# Might comment out during development so other warnings
# are not surpressed
import warnings
warnings.filterwarnings('ignore')

### Processing variables
Then we set some values based on those.

- The **`file`** is the name of the file we want to process
- The **`tax_rate`** is the value we need for this file to get the Gross Recipts (vs the Tax Reported, which is just the tax amount the establishment paid). The comptroller [has information on the tax](https://comptroller.texas.gov/taxes/mixed-beverage/receipts.php), but this [old record layout](https://github.com/utdata/cli-tools/blob/master/data/mixbevtax/OLD-MIXEDBEVTAX-LAYOUT.txt) best describes the math.
- The **`month_studied`** is the YYYY/MM designation for the month before the file release. The file released in February has mostly records from January, but can also have any other month, so we set here the specific month we want. Note there is a check later on that counts the number of files by month, which is worth checking.

Here are the files we have downloaded:

In [2]:
ls ../mixbev-files/2017/

MIXEDBEV_01_2017.CSV  MIXEDBEV_05_2017.CSV  MIXEDBEV_09_2017.CSV
MIXEDBEV_02_2017.CSV  MIXEDBEV_06_2017.CSV  MIXEDBEV_10_2017.CSV
MIXEDBEV_03_2017.CSV  MIXEDBEV_07_2017.CSV
MIXEDBEV_04_2017.CSV  MIXEDBEV_08_2017.csv


In [4]:
# this is our source file, which may have been downloaded above
# Swap out the file name here and date below as needed
file = '../mixbev-files/2017/MIXEDBEV_10_2017.CSV'
file_name = ''
online_name = ''

# setting the month_studied var.
# This should be checked in the table below that counts records by month
month_studied = '2017/09'

# Sets the tax rate to convert Report Tax to Gross Receipts
# It's 6.7 since January 1, 2014
tax_rate = Decimal('6.7')

### import and processing

In [5]:
# sets the column names of the original data set.
column_names = [
    'TABC Permit Number',
    'Trade Name',
    'Location Address',
    'Location City',
    'Location State',
    'Location Zip Code',
    'Location County Code',
    'Blank',
    'Report Period',
    'Report Tax'
]
# Helps us import some text fields that may be considered numbers in error.
specified_types = {
    'Location Zip Code': agate.Text(),
    'Location County Code': agate.Text()
}

# this imports the file specified above, along with the proper types
mixbev_raw = agate.Table.from_csv(file, column_names, encoding='iso-8859-1', column_types=specified_types)

# mixbev_trim creates a new interim table with results of compute function
# that takes the four columns that need trimming and strips them of white space,
# adding them to the end of the table with new names.
# The last computation does the math to create the Gross Receipts based on the tax_rate set above

mixbev_trim = mixbev_raw.compute([
    ('Permit', agate.Formula(agate.Text(), lambda r: r['TABC Permit Number'].strip())),
    ('Name', agate.Formula(agate.Text(), lambda r: r['Trade Name'].strip())),
    ('Address', agate.Formula(agate.Text(), lambda r: r['Location Address'].strip())),
    ('City', agate.Formula(agate.Text(), lambda r: r['Location City'].strip())),
    ('Receipts_compute', agate.Formula(agate.Number(), lambda r: (r['Report Tax'] / tax_rate) * 100))
])

# the Receipts_compute computation above returns as a decimal number,
# so this function rounds those numbers.
# I might refactor this late so I can use it elsewhere.
def round_receipt(row):
    return row['Receipts_compute'].quantize(Decimal('0.01'))

# This compute method uses round_recipt function above,
# putting the results into a new table.
mixbev_round = mixbev_trim.compute([
    ('Receipts', agate.Formula(agate.Number(), round_receipt))
])

# creates new table, selecting just the columns we need
# then renames some of them for ease later.
mixbev_cleaned = mixbev_round.select([
    'Permit',
    'Name',
    'Address',
    'City',
    'Location State',
    'Location Zip Code',
    'Location County Code',
    'Report Period',
    'Report Tax',
    'Receipts'
]).rename(column_names = {
    'Location State': 'State',
    'Location Zip Code': 'Zip',
    'Location County Code': 'CountyCode',
    'Report Period': 'Period',
    'Report Tax': 'Tax'
})

# Concatenates the name and address
mixbev_cleaned_est = mixbev_cleaned.compute([
    ('Establishment', agate.Formula(agate.Text(), lambda row: '%(Name)s %(Address)s' % row))
])

# importing countes.csv, ensuring that the 'code' column is text
counties = agate.Table.from_csv('../resource-files/counties.csv', column_types={'code': agate.Text()})

# joines the counties table to the mixed bev cleaned data with establishments
mixbev_joined = mixbev_cleaned_est.join(counties, 'CountyCode', 'code')

# get just the columns we need and rename county
# THIS is the finished, cleaned mixbev table
mixbev = mixbev_joined.select([
    'Permit',
    'Name',
    'Address',
    'Establishment',
    'City',
    'State',
    'Zip',
    'county',
    'Period',
    'Tax',
    'Receipts'
]).rename(column_names = {
    'county': 'County'
})


### Looking at dates of the records

This basically confirms that the file has multiple dates, and that we are looking at the right month of data. Typically a data set will have mostly reports from the previous month, but there are always also submissions from other months. We want to filter out those other months, which we do based on the `month_studied` variable set near the top of the file, which should match the period at the top of the table below.


In [6]:
# Pivot the mixbev table by Period. Default it give a Count of the records
# We then order the table by Count in descending order
by_period = mixbev.pivot('Period').order_by('Count', reverse=True)

# prints the table of period and number of records
by_period.limit(5).print_table(max_rows=None)

| Period  |  Count |
| ------- | ------ |
| 2017/09 | 14,469 |
| 2017/08 |  1,591 |
| 2017/07 |    170 |
| 2017/06 |     50 |
| 2017/05 |     29 |


In [7]:
## filters the records to our month_studied
mixbev_month = mixbev.where(lambda row: row['Period'] == month_studied)

# function to group sales by a specific location
# City or County passed in should be ALL CAPS
# Location_type can be 'City' or 'County'

def location_sum(location_type, location):
    # Filters the data to the specified city
    location_filtered = mixbev_month.where(lambda row: row[location_type].upper() == location)

    # groups the data based on Establishment and location
    location_grouped = location_filtered.group_by('Establishment').group_by(location_type)
    # computes the sales based on the grouping
    location_summary = location_grouped.aggregate([
        ('Tax_sum', agate.Sum('Tax')),
        ('Receipts_sum', agate.Sum('Receipts'))
    ])
    
    # sorts the results by most sold
    location_summary_sorted = location_summary.order_by('Receipts_sum', reverse=True)
    # prints the top 10 results
    
    return(location_summary_sorted)


## Top sales statewide

Because we want to group our results by more than one field and perform more than one aggregation, we'll do this a little differently. We'll use group_by to create a grouped table, then perform aggregations on that new table to computer the Tax and Receipts columns.

In [8]:
# groups the data based on Establishment and City
mixbev_grouped = mixbev_month.group_by('Establishment').group_by('County').group_by('City')

# computes the sales based on the grouping
state_summary = mixbev_grouped.aggregate([
    ('Tax_sum', agate.Sum('Tax')),
    ('Sales_sum', agate.Sum('Receipts'))
])

# sorts the results by most sold. We could probalby chain it above if we wanted to.
state_summary_sorted = state_summary.order_by('Sales_sum', reverse=True)

# summing sales statewide for month
print('\nTotal sales across the state for the given month: {}\n'.format(
    mixbev_month.aggregate(agate.Sum('Receipts'))
))

print('Top sales by establishment statewide\n')

# prints the top 10 results
state_summary_sorted.limit(10).exclude('County').print_table(max_column_width=45)


Total sales across the state for the given month: 552027608.41

Top sales by establishment statewide

| Establishment                                 | City        |    Tax_sum |    Sales_sum |
| --------------------------------------------- | ----------- | ---------- | ------------ |
| AT&T STADIUM 1 LEGENDS WAY                    | ARLINGTON   | 260,174.86 | 3,883,206.87 |
| GAYLORD TEXAN 1501 GAYLORD TRL                | GRAPEVINE   | 116,559.63 | 1,739,695.97 |
| HOSPITALITY INTERNATIONAL, INC 23808 RESOR... | SAN ANTONIO |  92,213.50 | 1,376,320.90 |
| WLS BEVERAGE CO 110 E 2ND ST                  | AUSTIN      |  73,719.56 | 1,100,291.94 |
| OMNI DALLAS CONVENTION CENTER 555 S LAMAR ST  | DALLAS      |  65,863.54 |   983,037.91 |
| ARAMARK SPORTS & ENTERTAINMENT 501 CRAWFOR... | HOUSTON     |  63,577.50 |   948,917.91 |
| METROPLEX SPORTSERVICE, INC. 1000 BALLPARK... | ARLINGTON   |  60,463.27 |   902,436.87 |
| RYAN SANDERS SPORTS SERVICES, 9201 CIRCUIT... | AUSTIN      |  58,5

## Austin sales and sums

With this, we refernce the location_sum function above, and pass the type of location (City) and the name of the city (AUSTIN). At the same time, we limit the result of that function to the first 10 records, and then print the results. We are basically stringing together a bunch of stuff at once.

In [9]:
# uses the city_sum function to filter
austin = location_sum('City', 'AUSTIN')

print('\nTotal sales across the state for the given month: {}\n'.format(
    austin.aggregate(agate.Sum('Receipts_sum'))
))

# print the resulting table
austin.limit(10).print_table(max_column_width=50)


Total sales across the state for the given month: 64935744.42

| Establishment                                      | City   |   Tax_sum | Receipts_sum |
| -------------------------------------------------- | ------ | --------- | ------------ |
| WLS BEVERAGE CO 110 E 2ND ST                       | AUSTIN | 73,719.56 | 1,100,291.94 |
| RYAN SANDERS SPORTS SERVICES, 9201 CIRCUIT OF T... | AUSTIN | 58,541.92 |   873,760.00 |
| W HOTEL AUSTIN 200 LAVACA ST                       | AUSTIN | 49,415.51 |   737,544.93 |
| ROSE ROOM/ 77 DEGREES 11500 ROCK ROSE AVE          | AUSTIN | 39,240.96 |   585,685.97 |
| 400 BAR/CUCARACHA/CHUPACABRA/J 400 E 6TH ST        | AUSTIN | 33,724.58 |   503,351.94 |
| BLIND PIG PUB / PIG PEN 317 E 6TH ST               | AUSTIN | 33,663.48 |   502,440.00 |
| THE DOGWOOD DOMAIN 11420 ROCK ROSE AVE STE 700     | AUSTIN | 26,688.24 |   398,331.94 |
| HOTEL VAN ZANDT 605 DAVIS ST                       | AUSTIN | 26,357.39 |   393,393.88 |
| THE PALAZIO 501 E BEN WH

## More Central Texas cities

In [10]:
location_sum('City', 'BASTROP').limit(3).print_table(max_column_width=40)

| Establishment                            | City    |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ------- | -------- | ------------ |
| OLD TOWN RESTURANT AND BAR/PIN 931 MA... | BASTROP | 4,096.71 |    61,144.93 |
| BACK 9 834 HIGHWAY 71 W                  | BASTROP | 3,098.61 |    46,247.91 |
| CHILI'S GRILL & BAR 734 HIGHWAY 71 W     | BASTROP | 2,690.92 |    40,162.99 |


In [11]:
location_sum('City', 'BEE CAVE').limit(3).print_table(max_column_width=40)

| Establishment                            | City     |  Tax_sum | Receipts_sum |
| ---------------------------------------- | -------- | -------- | ------------ |
| WOODY TAVERN AND GRILL, INC. 12801 SH... | BEE CAVE | 7,375.02 |   110,074.93 |
| CAFE BLUE 12800 HILL COUNTRY BLVD STE    | BEE CAVE | 5,869.66 |    87,606.87 |
| MAUDIE'S HILL COUNTRY, LLC 12506 SHOP... | BEE CAVE | 5,526.29 |    82,481.94 |


In [12]:
location_sum('City', 'BUDA').limit(3).print_table(max_column_width=40)

| Establishment                            | City |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ---- | -------- | ------------ |
| BUCKS BACKYARD 1750 S FM 1626            | BUDA | 5,774.32 |    86,183.88 |
| WILLIE'S JOINT 824 MAIN ST               | BUDA | 4,580.79 |    68,370.00 |
| NATE'S AT BUDA MILL AND GRAIN, 306 S ... | BUDA | 3,717.69 |    55,487.91 |


In [13]:
location_sum('City', 'CEDAR PARK').limit(3).print_table(max_column_width=40)

| Establishment                            | City       |   Tax_sum | Receipts_sum |
| ---------------------------------------- | ---------- | --------- | ------------ |
| THE GROVE WINE BAR AND KITCHEN 1310 E... | CEDAR PARK | 11,189.40 |   167,005.97 |
| CHUY'S 4911 183A TOLL RD                 | CEDAR PARK |  9,385.29 |   140,078.96 |
| LUPE TORTILLA MEXICAN RESTAURA 4501 1... | CEDAR PARK |  6,885.79 |   102,772.99 |


In [14]:
location_sum('City', 'DRIPPING SPRINGS').limit(3).print_table(max_column_width=40)

| Establishment                            | City             |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ---------------- | -------- | ------------ |
| TREATY OAK BREWING AND DISTILL 16604 ... | DRIPPING SPRINGS | 6,776.11 |   101,135.97 |
| DEEP EDDY DISTILLING CO 2250 E HIGHWA... | DRIPPING SPRINGS | 4,726.91 |    70,550.90 |
| TRUDY'S FOUR STAR 13059 FOUR STAR BLVD   | DRIPPING SPRINGS | 4,496.90 |    67,117.91 |


In [15]:
location_sum('City', 'GEORGETOWN').limit(3).print_table(max_column_width=40)

| Establishment                            | City       |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ---------- | -------- | ------------ |
| EL MONUMENTO 205 W 2ND ST                | GEORGETOWN | 5,850.10 |    87,314.93 |
| HARDTAILS 1515 N IH 35                   | GEORGETOWN | 5,387.26 |    80,406.87 |
| GATEWAY BEVERAGES, INC 1101 WOODLAWN AVE | GEORGETOWN | 4,862.45 |    72,573.88 |


In [16]:
location_sum('City', 'KYLE').limit(3).print_table(max_column_width=40)

| Establishment                            | City |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ---- | -------- | ------------ |
| CASA GARCIA'S MEXICAN RESTAURA 5401 F... | KYLE | 5,093.27 |    76,018.96 |
| EVO ENTERTAINMENT CENTER 3200 KYLE XING  | KYLE | 3,410.56 |    50,903.88 |
| CENTERFIELD SPORTS BAR & GRILL 200 W ... | KYLE | 2,709.74 |    40,443.88 |


In [17]:
location_sum('City', 'LAGO VISTA').limit(3).print_table(max_column_width=40)

| Establishment                            | City       |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ---------- | -------- | ------------ |
| COPPERHEAD GRILL 6115 LOHMANS FORD RD    | LAGO VISTA | 1,111.06 |    16,582.99 |
| MARIA'S BAR & GRILL MEXICAN RE 20602 ... | LAGO VISTA |   453.59 |     6,770.00 |
| THE GRILLE AT HIGHLAND LAKES 20552 HI... | LAGO VISTA |   322.80 |     4,817.91 |


In [18]:
location_sum('City', 'LAKEWAY').limit(3).print_table(max_column_width=40)

| Establishment                            | City    |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ------- | -------- | ------------ |
| THE GROVE WINE BAR AND KITCHEN 3001 R... | LAKEWAY | 7,815.88 |   116,654.93 |
| LAKEWAY RESORT AND SPA 101 LAKEWAY DR    | LAKEWAY | 7,710.69 |   115,084.93 |
| SANTA CANTARINA RESTAURANT 1310 RANCH... | LAKEWAY | 3,813.90 |    56,923.88 |


In [19]:
location_sum('City', 'LEANDER').limit(3).print_table(max_column_width=40)

| Establishment                            | City    |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ------- | -------- | ------------ |
| BROOKLYN HEIGHTS PIZZERIA 3550 LAKELI... | LEANDER | 4,345.68 |    64,860.90 |
| JARDIN DEL REY 703 S HIGHWAY 183         | LEANDER | 2,635.51 |    39,335.97 |
| TAPATIA JALISCO #3 LLC 651 N US 183      | LEANDER |   883.99 |    13,193.88 |


In [20]:
location_sum('City', 'LIBERTY HILL').limit(3).print_table(max_column_width=40)

| Establishment                            | City         |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ------------ | -------- | ------------ |
| JARDIN CORONA 15395 W STATE HIGHWAY 29   | LIBERTY HILL | 3,049.97 |    45,521.94 |
| MARGARITA'S RESTAURANT 10280 W STATE ... | LIBERTY HILL | 2,444.49 |    36,484.93 |
| ELENAS MEXICAN RESTAURANT 14801 W STA... | LIBERTY HILL |   413.32 |     6,168.96 |


In [21]:
location_sum('City', 'PFLUGERVILLE').limit(3).print_table(max_column_width=40)

| Establishment                          | City         |   Tax_sum | Receipts_sum |
| -------------------------------------- | ------------ | --------- | ------------ |
| MAVERICKS 1700 GRAND AVENUE PKWY STE 2 | PFLUGERVILLE | 10,975.33 |   163,810.90 |
| HANOVER'S DRAUGHT HAUS 108 E MAIN ST   | PFLUGERVILLE |  4,944.26 |    73,794.93 |
| MARSHALL'S TAVERN 113 E PECAN ST       | PFLUGERVILLE |  4,144.01 |    61,850.90 |


In [22]:
location_sum('City', 'ROUND ROCK').limit(5).print_table(max_column_width=40)

| Establishment                            | City       |   Tax_sum | Receipts_sum |
| ---------------------------------------- | ---------- | --------- | ------------ |
| THIRD BASE ROUND ROCK, LLC 3107 S INT... | ROUND ROCK | 10,062.46 |   150,185.97 |
| RICK'S CABARET 3105 S INTERSTATE 35      | ROUND ROCK |  9,765.11 |   145,747.91 |
| CHUY'S ROUND ROCK 2320 N INTERSTATE 35   | ROUND ROCK |  9,705.55 |   144,858.96 |
| FAST EDDIE'S NEIGHBORHOOD BILL 100 PA... | ROUND ROCK |  9,523.84 |   142,146.87 |
| TWIN PEAKS RESTAURANT 100 LOUIS HENNA... | ROUND ROCK |  8,521.06 |   127,180.00 |


In [23]:
location_sum('City', 'SAN MARCOS').limit(5).print_table(max_column_width=40)

| Establishment                         | City       |   Tax_sum | Receipts_sum |
| ------------------------------------- | ---------- | --------- | ------------ |
| THE MARC 120 E SAN ANTONIO ST         | SAN MARCOS | 10,200.81 |   152,250.90 |
| ZELICKS 336 W HOPKINS ST              | SAN MARCOS |  7,861.11 |   117,330.00 |
| CHIMY'S SAN MARCOS 217 E HOPKINS ST   | SAN MARCOS |  7,818.96 |   116,700.90 |
| MAYLOO'S 138 N LBJ DR                 | SAN MARCOS |  7,144.00 |   106,626.87 |
| CRAFTHOUSE KITCHEN & TAP 242 N LBJ DR | SAN MARCOS |  7,001.16 |   104,494.93 |


In [24]:
location_sum('City', 'SPICEWOOD').limit(3).print_table(max_column_width=40)

| Establishment                            | City      |  Tax_sum | Receipts_sum |
| ---------------------------------------- | --------- | -------- | ------------ |
| ANGEL'S ICEHOUSE 21815 W HWY 71          | SPICEWOOD | 4,040.16 |    60,300.90 |
| POODIES HILLTOP ROADHOUSE 22308 STATE... | SPICEWOOD | 3,008.30 |    44,900.00 |
| APIS RESTAURANT 23526 STATE HIGHWAY 71 W | SPICEWOOD | 1,680.15 |    25,076.87 |


In [25]:
location_sum('City', 'SUNSET VALLEY').limit(3).print_table(max_column_width=40)

| Establishment                            | City          |  Tax_sum | Receipts_sum |
| ---------------------------------------- | ------------- | -------- | ------------ |
| DOC'S BACKYARD 5207 BRODIE LN # 100      | SUNSET VALLEY | 6,076.43 |    90,692.99 |
| BJ'S RESTAURANT AND BREWHOUSE 5207 BR... | SUNSET VALLEY | 4,085.79 |    60,981.94 |
| CANE ROSSO 4715 S LAMAR BLVD STE 103     | SUNSET VALLEY | 1,941.39 |    28,975.97 |


In [26]:
location_sum('City', 'WEST LAKE HILLS').limit(3).print_table(max_column_width=40)

| Establishment                            | City            |  Tax_sum | Receipts_sum |
| ---------------------------------------- | --------------- | -------- | ------------ |
| LUPE TORTILLA MEXICAN RESTAURA 701 S ... | WEST LAKE HILLS | 6,036.36 |    90,094.93 |
| CHIPOTLE CHIPOTLE MEXICAN GRIL 3300 B... | WEST LAKE HILLS |    52.32 |       780.90 |


## Sales by county example

In this case, we pass in the location type of 'County' and then a county name in caps to get the most sales in a particular county.

In [27]:
location_sum('County', 'CALDWELL').limit(3).print_table(max_column_width=40)

| Establishment                            | County   |  Tax_sum | Receipts_sum |
| ---------------------------------------- | -------- | -------- | ------------ |
| RISKY BUSINESS 211 E MARKET ST           | Caldwell | 1,385.62 |    20,680.90 |
| GUADALAJARA MEXICAN RESTAURANT 1710 S... | Caldwell | 1,296.78 |    19,354.93 |
| THE PEARL 110 N MAIN ST                  | Caldwell | 1,035.21 |    15,450.90 |


## Sales by ZIP Code
Just making sure that 78701 is at the top of this list, which it has been every month for a decade.

In [28]:
# top zip code gross receipts
zip_receipts = mixbev_month.pivot('Zip', aggregation=agate.Sum('Receipts')).order_by('Sum', reverse=True)
zip_receipts.limit(5).print_table()

| Zip   |           Sum |
| ----- | ------------- |
| 78701 | 25,102,730.30 |
| 75201 | 12,931,342.05 |
| 78205 |  8,990,296.08 |
| 76011 |  7,427,398.01 |
| 77002 |  7,323,262.45 |
