# Processing mixed beverage data

## Single-month sales report

This Jupyter Notebook uses data.texas.gov [Mixed Beverage Gross Receipts data](https://data.texas.gov/Government-and-Taxes/Mixed-Beverage-Gross-Receipts/naix-2893), and then a python library called [agate](http://agate.readthedocs.io/) to clean and process that data for [stories similar to this one](http://www.mystatesman.com/business/austin-alcohol-sales-percent-february/Oo2txZUkuDlqBl0rU9O1lJ/) on monthly alcohol sales.

There are two places where **last day of the month** has to be entered into the script, in YYYY-MM-DD format:
- `download_date` in the "Download" section
- `date_studied` in the "Setting up" section

For May 2018, use:

`2018-04-30`

Last run on: 2018-06-18

## Download

We're using the bash curl command to download our file from Socrata.

### Set download date

Make sure to set the `download_date` below to the last day of the month studied.

In [1]:
%%bash
download_date='2018-04-30'

socrata_data='https://data.texas.gov/resource/fp9t-htqh.json?'
socrata_limit='$limit=20000'
socrata_date='&$where=obligation_end_date_yyyymmdd%20=%20%27'$download_date'%27'
socrata_path=$socrata_data$socrata_limit$socrata_date

echo $socrata_path

curl -L -o ../mixbev-files/Mixed_Beverage_Gross_Receipts_$download_date.json \
$socrata_path

https://data.texas.gov/resource/fp9t-htqh.json?$limit=20000&$where=obligation_end_date_yyyymmdd%20=%20%272018-04-30%27


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11.3M    0 11.3M    0     0   592k      0 --:--:--  0:00:19 --:--:--  620k


## Set date_studied

It is imporant to change `date_studied` to the last day of the month studied:

In [2]:
# this should be the same as the download_date above.
date_studied = '2018-04-30'

## Set up python

We'll import the python libraries we need, and then set the date_studied, which is important to find the right file.

In [3]:
# import libraries
import agate
# this surpresses the timezone warning
# Might comment out during development so other warnings
# are not surpressed
import warnings
warnings.filterwarnings('ignore')

## Import the file

We set the path name based on `date_studied` above. We import the raw data, forcing some data types.

In [4]:
# This is the source file, which should be donwloaded
file_path = '../mixbev-files/Mixed_Beverage_Gross_Receipts_' + date_studied + '.json'

# Setting up to force column types for some columns
specified_types = {
    'cover_charge_receipts': agate.Number(),
    'taxpayer_number': agate.Text(),
    'taxpayer_zip': agate.Text(),
    'taxpayer_county': agate.Text(),
    'location_number': agate.Text(),
    'location_address': agate.Text(),
    'location_zip': agate.Text(),
    'location_county': agate.Text(),
}

# this imports the file specified above, along with the proper types
mixbev_raw = agate.Table.from_json(file_path, column_types=specified_types)

# Uncomment below to peek at column names
# print(mixbev_raw)

## Cleaning bad columns

There is a problem with the Comptroller data where two columns are incorrectly labeled. They are NOT used in the analysis, but we'll fix them anyway. It could be that the Comptroller will fix these in the future.

In [5]:
# renames two columns that are labelled incorrectly from original data
mixbev_fixed = mixbev_raw.rename(column_names = {
        'tabc_permit_number': 'inside_outside_city_limits_code',
        'inside_outside_city_limits_code_y_n': 'tabc_permit',
    }
)

# Uncomment below to see changed columns
# print(mixbev_fixed)

In [6]:
# establish our table of record
mixbev = mixbev_fixed

## Create establishment column

We do this so we make sure we have single establishments instead of grouping trade names together from different addresses, like 'CHILI'S BAR & GRILL'.

In [7]:
# Concatenates the name and address
mixbev_establishment = mixbev.compute([
    ('Establishment', agate.Formula(agate.Text(), lambda row: '%(location_name)s (%(location_address)s)' % row))
])

# Uncomment line below to print Establishment to check what is looks like
# mixbev_establishment.select('Establishment').limit(5).print_table(max_column_width=80)

In [8]:
# Uncomment below to see column names
# print(mixbev_establishment)

## Import and merge counties lookup table

We do this to get county names. I got this list from the comptroller.

NOTE: Wisdom would suggest we join on the code column from counites, but the data.texas.gov data does not have the zero padding from those values, so I'm using the id column.

In [9]:
# importing countes.csv, ensuring that the 'code' column is text
counties = agate.Table.from_csv('../resource-files/counties.csv', column_types={
    'code': agate.Text(),
    'id': agate.Text()
})

# uncomment below to peek at the column names and an example
# print(counties)
# counties.limit(5).print_table()

# joines the counties table to the mixed bev cleaned data with establishments
mixbev_joined = mixbev_establishment.join(counties, 'location_county', 'id')

# uncomment below to see merged column names 
# print(mixbev_joined)

In [10]:
# get just the columns we need and rename county
# THIS is the finished, cleaned mixbev table
mixbev = mixbev_joined.select([
    'location_name',
    'location_address',
    'Establishment',
    'location_city',
    'location_state',
    'location_zip',
    'county',
    'total_receipts',
    'obligation_end_date_yyyymmdd'
]).rename(column_names = {
    'location_name' : 'Name',
    'location_address' : 'Address',
    'location_city': 'City',
    'location_state': 'State',
    'location_zip': 'Zip',
    'total_receipts' : 'Receipts',
    'county': 'County',
    'obligation_end_date_yyyymmdd': 'Report date'
})

# peek at the column names
print(mixbev)

| column        | data_type |
| ------------- | --------- |
| Name          | Text      |
| Address       | Text      |
| Establishment | Text      |
| City          | Text      |
| State         | Text      |
| Zip           | Text      |
| County        | Text      |
| Receipts      | Number    |
| Report date   | DateTime  |



## Location sums function

Because we want to get the top sellers in a bunch of cities and couties, we create a function so we don't have to repeat the code. This function allows us to pass in a city or county name to filter the monthly receipts table and then sum the Tax and Receipts columns. The result can then be acted on to print or aggreggate. It is used later in the file.

In [11]:
# function to group sales by a specific location
# City or County passed in should be ALL CAPS
# Location_type can be 'City' or 'County'

def location_sum(location_type, location):
    # Filters the data to the specified city
    location_filtered = mixbev.where(lambda row: row[location_type].upper() == location)

    # groups the data based on Establishment and location
    location_grouped = location_filtered.group_by('Establishment').group_by(location_type)
    # computes the sales based on the grouping
    location_summary = location_grouped.aggregate([
        ('Receipts_sum', agate.Sum('Receipts'))
    ])
    
    # sorts the results by most sold
    location_summary_sorted = location_summary.order_by('Receipts_sum', reverse=True)
    # prints the top 10 results
    
    return(location_summary_sorted)

## Data checking
I make sure I'm looking the correct month, and to make sure the number of records is not hitting my download limit.

In [12]:
# double-checking I'm looking at one month of data
mixbev_dates = mixbev.select('Report date').distinct('Report date')
print('This report date should equal your date_studied:\n')
mixbev_dates.print_table()
print('\nIf the number of records below is 20000, increase the download limit.')
print('\nNumber of records in table: {}'.format(
        len(mixbev))
     )

This report date should equal your date_studied:

|         Report date |
| ------------------- |
| 2018-04-30 00:00:00 |

If the number of records below is 20000, increase the download limit.

Number of records in table: 16259


## Top sales statewide

Because we want to group our results by more than one field and perform more than one aggregation, we'll do this a little differently. We'll use group_by to create a grouped table, then perform aggregations on that new table to compute the Tax and Receipts columns.

In [13]:
# summing sales statewide for month

print('Total statewide sales for this month are: {}\n'.format(
    mixbev.aggregate(agate.Sum('Receipts')) # <<< I should format the number better
))

# groups the data based on Establishment and City
mixbev_grouped = mixbev.group_by('Establishment').group_by('County').group_by('City')

# computes the sales based on the grouping
state_summary = mixbev_grouped.aggregate([
    ('Sales_sum', agate.Sum('Receipts'))
])

# sorts the results by most sold. We could probalby chain it above if we wanted to.
state_summary_sorted = state_summary.order_by('Sales_sum', reverse=True)

# prints the top 10 results
state_summary_sorted.limit(10).print_table(max_column_width=40)


Total statewide sales for this month are: 623558178

| Establishment                            | County  | City        | Sales_sum |
| ---------------------------------------- | ------- | ----------- | --------- |
| HOSPITALITY INTERNATIONAL, INC. (2380... | Bexar   | SAN ANTONIO | 1,922,631 |
| ARAMARK SPORTS & ENTERTAINMENT SERVIC... | Harris  | HOUSTON     | 1,864,394 |
| LEVY RESTAURANTS AT TOYOTA CENTER (15... | Harris  | HOUSTON     | 1,778,443 |
| RYAN SANDERS SPORTS SERVICES, LLC (92... | Travis  | DEL VALLE   | 1,623,825 |
| GAYLORD TEXAN (1501 GAYLORD TRL)         | Tarrant | GRAPEVINE   | 1,409,146 |
| WLS BEVERAGE CO (110 E 2ND ST)           | Travis  | AUSTIN      | 1,388,201 |
| THE POST OAK/MASTRO'S/WILLIE G'S (160... | Harris  | HOUSTON     | 1,381,065 |
| BOTTLED BLONDE- THE BACK YARD (501 N ... | Dallas  | DALLAS      | 1,283,728 |
| OMNI DALLAS CONVENTION CENTER (555 S ... | Dallas  | DALLAS      | 1,102,209 |
| SALC, INC. (2201 N STEMMONS FWY FL 1)    | Dallas  | D

## Austin sales and sums

With this, we reference the location_sum function above, and pass the type of location (City) and the name of the city (AUSTIN). At the same time, we limit number of recoreds in the return, and then print the results. We are basically stringing together a bunch of stuff at once.

In [14]:
# Austin total sales as s city
# This sums the grouped table, but it works

print('Total sales in Austin are: {}\n'.format(
    location_sum('City', 'AUSTIN').aggregate(agate.Sum('Receipts_sum'))
))

# uses the city_sum function to filter
austin = location_sum('City', 'AUSTIN')


# print the resulting table
print('Top sellers in Austin are:\n')
austin.limit(25).print_table(max_column_width=60, max_rows=None)

Total sales in Austin are: 72406289

Top sellers in Austin are:

| Establishment                                                | City   | Receipts_sum |
| ------------------------------------------------------------ | ------ | ------------ |
| WLS BEVERAGE CO (110 E 2ND ST)                               | AUSTIN |    1,388,201 |
| BRICK HOUSE TAVERN + TAP (11680 RESEARCH BLVD # A)           | AUSTIN |      945,285 |
| FAIRMONT AUSTIN HOTEL (101 RED RIVER ST)                     | AUSTIN |      750,263 |
| UNBARLIEVABLE (76 RAINEY ST)                                 | AUSTIN |      509,102 |
| ROSE ROOM/ 77 DEGREES (11500 ROCK ROSE AVE)                  | AUSTIN |      466,783 |
| RIO (601 RIO GRANDE ST STE 200)                              | AUSTIN |      460,480 |
| SAN JACINTO BEVERAGE COMPANY LLC (98 SAN JACINTO BLVD)       | AUSTIN |      439,987 |
| CONTAINER BAR (90 RAINEY ST)                                 | AUSTIN |      435,793 |
| STUBB'S BAR-B-Q (801 RED RIVER ST)         

## More Central Texas cities
More of the same for different cities.

In [15]:
location_sum('City', 'BASTROP').limit(5).print_table(max_column_width=60)

| Establishment                                                | City    | Receipts_sum |
| ------------------------------------------------------------ | ------- | ------------ |
| OLD TOWN RESTURANT AND BAR/PINEY CREEK CHOP HOUSE (931 MA... | BASTROP |       71,708 |
| CHILI'S GRILL & BAR (734 HIGHWAY 71 W)                       | BASTROP |       57,208 |
| NEIGHBOR'S (601 CHESTNUT ST UNIT C)                          | BASTROP |       42,369 |
| BACK 9 (834 HIGHWAY 71 W)                                    | BASTROP |       31,213 |
| MORELIA MEXICAN GRILL (696 HIGHWAY 71 W STE 304)             | BASTROP |       25,889 |


In [16]:
location_sum('City', 'BEE CAVE').limit(3).print_table(max_column_width=60)

| Establishment                                         | City     | Receipts_sum |
| ----------------------------------------------------- | -------- | ------------ |
| MAUDIE'S HILL COUNTRY, LLC (12506 SHOPS PKWY)         | BEE CAVE |       93,547 |
| WOODY TAVERN AND GRILL, INC. (12801 SHOPS PKWY # 100) | BEE CAVE |       91,778 |
| THE GINGER MAN (12700 HILL COUNTRY BLVD STE T)        | BEE CAVE |       87,045 |


In [17]:
location_sum('City', 'BUDA').limit(3).print_table(max_column_width=60)

| Establishment                                               | City | Receipts_sum |
| ----------------------------------------------------------- | ---- | ------------ |
| BUCKS BACKYARD (1750 S FM 1626)                             | BUDA |       75,525 |
| WILLIE'S JOINT (824 MAIN ST)                                | BUDA |       73,436 |
| NATE'S AT BUDA MILL AND GRAIN, INC. (306 S MAIN ST STE 101) | BUDA |       64,752 |


In [18]:
location_sum('City', 'DEL VALLE').limit(3).print_table(max_column_width=60)

| Establishment                                                | City      | Receipts_sum |
| ------------------------------------------------------------ | --------- | ------------ |
| RYAN SANDERS SPORTS SERVICES, LLC (9201 CIRCUIT OF THE AM... | DEL VALLE |    1,623,825 |
| CLUB MIS AMIGOS (13820 FM 812)                               | DEL VALLE |        9,248 |
| RESTAURANTES JASMINES (2463 HIGWAY 71 E)                     | DEL VALLE |        4,159 |


In [19]:
location_sum('City', 'CEDAR PARK').limit(3).print_table(max_column_width=60)

| Establishment                                                | City       | Receipts_sum |
| ------------------------------------------------------------ | ---------- | ------------ |
| THE GROVE WINE BAR AND KITCHEN (1310 E WHITESTONE BLVD # ... | CEDAR PARK |      136,403 |
| CHUY'S (4911 183A TOLL RD)                                   | CEDAR PARK |      130,552 |
| LUPE TORTILLA MEXICAN RESTAURANT (4501 183A TOLL RD STE B)   | CEDAR PARK |      119,514 |


In [20]:
location_sum('City', 'DRIPPING SPRINGS').limit(3).print_table(max_column_width=60)

| Establishment                                             | City             | Receipts_sum |
| --------------------------------------------------------- | ---------------- | ------------ |
| TREATY OAK BREWING AND DISTILLING CO. (16604 FITZHUGH RD) | DRIPPING SPRINGS |       93,892 |
| FAMILY BUSINESS BEER COMPANY (19510 HAMILTON POOL RD)     | DRIPPING SPRINGS |       73,193 |
| FLORES MEXICAN RESTAURANT (2440 E HIGHWAY 290 BLDG D)     | DRIPPING SPRINGS |       61,737 |


In [21]:
location_sum('City', 'GEORGETOWN').limit(3).print_table(max_column_width=60)

| Establishment                                        | City       | Receipts_sum |
| ---------------------------------------------------- | ---------- | ------------ |
| EL MONUMENTO (205 W 2ND ST)                          | GEORGETOWN |      124,600 |
| GATEWAY BEVERAGES, INC (1101 WOODLAWN AVE)           | GEORGETOWN |       77,094 |
| BLUE CORN HARVEST BAR & GRILL (212 W 7TH ST STE 105) | GEORGETOWN |       71,039 |


In [22]:
location_sum('City', 'KYLE').limit(3).print_table(max_column_width=60)

| Establishment                                           | City | Receipts_sum |
| ------------------------------------------------------- | ---- | ------------ |
| RAILHOUSE (107 E CENTER ST)                             | KYLE |      144,416 |
| CASA GARCIA'S MEXICAN RESTAURANT (5401 FM 1626 STE 300) | KYLE |       75,150 |
| EVO ENTERTAINMENT CENTER (3200 KYLE XING)               | KYLE |       59,876 |


In [23]:
location_sum('City', 'LAGO VISTA').limit(3).print_table(max_column_width=60)

| Establishment                                         | City       | Receipts_sum |
| ----------------------------------------------------- | ---------- | ------------ |
| THE GNARLY GAR (18200 LAKEPOINT CV)                   | LAGO VISTA |       62,681 |
| COPPERHEAD GRILL (6115 LOHMANS FORD RD)               | LAGO VISTA |       21,273 |
| THE GRILLE AT HIGHLAND LAKES (20552 HIGHLAND LAKE DR) | LAGO VISTA |       20,023 |


In [24]:
location_sum('City', 'LAKEWAY').limit(3).print_table(max_column_width=60)

| Establishment                                          | City    | Receipts_sum |
| ------------------------------------------------------ | ------- | ------------ |
| THE GROVE WINE BAR AND KITCHEN (3001 RANCH ROAD 620 S) | LAKEWAY |      115,128 |
| LAKEWAY RESORT AND SPA (101 LAKEWAY DR)                | LAKEWAY |       98,234 |
| THE HILLS AT LAKEWAY (26 CLUB ESTATES PKWY)            | LAKEWAY |       74,821 |


In [25]:
location_sum('City', 'LEANDER').limit(3).print_table(max_column_width=60)

| Establishment                                                | City    | Receipts_sum |
| ------------------------------------------------------------ | ------- | ------------ |
| BROOKLYN HEIGHTS PIZZERIA (3550 LAKELINE BLVD STE 135)       | LEANDER |       54,995 |
| DA-FAN CLUB BAR & GRILL (15609 RONALD W REAGAN BLVD BLDG ... | LEANDER |       44,776 |
| JARDIN DEL REY (703 S HIGHWAY 183)                           | LEANDER |       41,793 |


In [26]:
location_sum('City', 'LIBERTY HILL').limit(3).print_table(max_column_width=60)

| Establishment                                      | City         | Receipts_sum |
| -------------------------------------------------- | ------------ | ------------ |
| JARDIN CORONA (15395 W STATE HIGHWAY 29)           | LIBERTY HILL |       46,371 |
| ELENA'S BAR & GRILL LLC (14801 W STATE HIGHWAY 29) | LIBERTY HILL |       44,776 |
| MARGARITA'S RESTAURANT (10280 W STATE HIGHWAY 29)  | LIBERTY HILL |       35,561 |


In [27]:
location_sum('City', 'PFLUGERVILLE').limit(3).print_table(max_column_width=60)

| Establishment                                  | City         | Receipts_sum |
| ---------------------------------------------- | ------------ | ------------ |
| MAVERICKS (1700 GRAND AVENUE PKWY STE 240)     | PFLUGERVILLE |      125,696 |
| HANOVER'S DRAUGHT HAUS (108 E MAIN ST)         | PFLUGERVILLE |       72,005 |
| RIO GRANDE TEX MEX (15821 CENTRAL COMMERCE DR) | PFLUGERVILLE |       67,527 |


In [28]:
location_sum('City', 'ROUND ROCK').limit(5).print_table(max_column_width=60)

| Establishment                                               | City       | Receipts_sum |
| ----------------------------------------------------------- | ---------- | ------------ |
| RYAN SANDERS SPORTS SERVICES, LLC (3400 E PALM VALLEY BLVD) | ROUND ROCK |      184,214 |
| CHUY'S ROUND ROCK (2320 N INTERSTATE 35)                    | ROUND ROCK |      134,767 |
| JACK ALLEN'S KITCHEN (2500 HOPPE TRL)                       | ROUND ROCK |      133,269 |
| RICK'S CABARET (3105 S INTERSTATE 35)                       | ROUND ROCK |      129,807 |
| TWIN PEAKS RESTAURANT (100 LOUIS HENNA BLVD)                | ROUND ROCK |      125,160 |


In [29]:
location_sum('City', 'SAN MARCOS').limit(5).print_table(max_column_width=60)

| Establishment                           | City       | Receipts_sum |
| --------------------------------------- | ---------- | ------------ |
| PLUCKERS WING BAR (105 N INTERSTATE 35) | SAN MARCOS |      125,863 |
| THE MARC (120 E SAN ANTONIO ST)         | SAN MARCOS |      122,556 |
| ZELICKS (336 W HOPKINS ST)              | SAN MARCOS |      116,719 |
| CHIMY'S SAN MARCOS (217 E HOPKINS ST)   | SAN MARCOS |      107,626 |
| MAYLOO'S (138 N LBJ DR)                 | SAN MARCOS |      106,459 |


In [30]:
location_sum('City', 'SPICEWOOD').limit(3).print_table(max_column_width=60)

| Establishment                                        | City      | Receipts_sum |
| ---------------------------------------------------- | --------- | ------------ |
| ANGEL'S ICEHOUSE (21815 W HWY 71)                    | SPICEWOOD |       58,676 |
| POODIES HILLTOP ROADHOUSE (22308 STATE HIGHWAY 71 W) | SPICEWOOD |       46,767 |
| THE LIGHTHOUSE RESTAURANT & LOUNGE (513 SLEAT DR)    | SPICEWOOD |       39,208 |


In [31]:
location_sum('City', 'SUNSET VALLEY').limit(3).print_table(max_column_width=60)

| Establishment                                          | City          | Receipts_sum |
| ------------------------------------------------------ | ------------- | ------------ |
| DOC'S BACKYARD (5207 BRODIE LN # 100)                  | SUNSET VALLEY |       78,164 |
| BJ'S RESTAURANT AND BREWHOUSE (5207 BRODIE LN STE 300) | SUNSET VALLEY |       60,247 |
| CANE ROSSO (4715 S LAMAR BLVD STE 103)                 | SUNSET VALLEY |       31,873 |


In [32]:
location_sum('City', 'WEST LAKE HILLS').limit(3).print_table(max_column_width=60)

| Establishment                                                | City            | Receipts_sum |
| ------------------------------------------------------------ | --------------- | ------------ |
| LUPE TORTILLA MEXICAN RESTAURANT (701 S CAPITAL OF TEXAS ... | WEST LAKE HILLS |       94,483 |
| SWAY (3437 BEE CAVES RD)                                     | WEST LAKE HILLS |       44,776 |
| CHIPOTLE CHIPOTLE MEXICAN GRILL CHIPOTLE BURRITOS (3300 B... | WEST LAKE HILLS |          679 |


## Sales by county
In this case, we pass in the location type of 'County' and then a county name in caps to get the most sales in a particular county.

In [33]:
# MOVE TO SEPARATE CELLS SO THEY DON'T FORCE A SCROLL
location_sum('County', 'BASTROP').limit(5).print_table(max_column_width=80)
print('\n')
location_sum('County', 'CALDWELL').limit(5).print_table(max_column_width=80)
print('\n')
location_sum('County', 'HAYS').limit(5).print_table(max_column_width=80)
print('\n')
location_sum('County', 'TRAVIS').limit(5).print_table(max_column_width=80)
print('\n')
location_sum('County', 'WILLIAMSON').limit(5).print_table(max_column_width=80)

| Establishment                                                   | County  | Receipts_sum |
| --------------------------------------------------------------- | ------- | ------------ |
| LOST PINES BEVERAGE LLC (575 HYATT LOST PINES ROAD)             | Bastrop |      653,634 |
| OLD TOWN RESTURANT AND BAR/PINEY CREEK CHOP HOUSE (931 MAIN ST) | Bastrop |       71,708 |
| CHILI'S GRILL & BAR (734 HIGHWAY 71 W)                          | Bastrop |       57,208 |
| NEIGHBOR'S (601 CHESTNUT ST UNIT C)                             | Bastrop |       42,369 |
| BACK 9 (834 HIGHWAY 71 W)                                       | Bastrop |       31,213 |


| Establishment                                               | County   | Receipts_sum |
| ----------------------------------------------------------- | -------- | ------------ |
| EL REY BAR/NIGHTCLUB (1721 S COMMERCE ST)                   | Caldwell |       25,354 |
| GUADALAJARA MEXICAN RESTAURANT (1710 S COLORADO ST STE 110) | Caldwell |   

## Sales by ZIP Code
A list of sales by ZIP Code. If anything other than 78701 is at the top, it is news.

In [34]:
# top zip code gross receipts
zip_receipts = mixbev.pivot('Zip', aggregation=agate.Sum('Receipts')).order_by('Sum', reverse=True)
zip_receipts.limit(5).print_table()

| Zip   |        Sum |
| ----- | ---------- |
| 78701 | 28,419,145 |
| 78205 | 15,578,119 |
| 77002 | 13,843,162 |
| 75201 | 11,972,048 |
| 78704 |  8,588,920 |
