# Goal
To find the best place (metropolitan statistical area) to open a dance studio.

### Difficulty
Dance studio market statistics is not available for each region.

### What is available then?
- Employment data for dancers and similar business
    - Dancers, performing art directors, fitness instructors, etc
- Yearly revenue data of dance studio, fitness, and performing art business

### Hypothesis to test
- Employee's hourly wage x employment population (employment size) is proportianal to the revenue size.
- Test this idea with the U.S. overall statistics.

### Strategy
1. If my hypothesis is correct, I can estimate the total revenue size of a region using employment data.
2. Estimate the number of dance studio in the google map (is there a better idea?)
3. Divide item 1 by item 2 to get an average revenue of a dance studio.
4. Estimate the cost of a dance studio in each region (use tax database, rent prices from the web, etc).
5. Calculate profit by subtract item 4 from item 3.
6. Find the place with high profit.


# Data


## Source

### Government data
- Employment statistics: https://www.bls.gov
- Population estimation: https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-metro-and-micro-statistical-areas.html

### Open data from companies
- Statista
    - The terms "market size" and "revenue" seems to be indistinguished. Here, I assume they mean revenue.
- Federal Reserve Bank of St. Louis

## Data explanation
    
### Dance studio in the U.S.
- Statistics for 2021
    - https://www.thestudiodirector.com/blog/dance-studio-industry-stats/
        - 2021
        - 54,627 studios
        - Owner's annual income: \\$43,048 (from ZipRecruiter)
        - Total profit: \\$2.35 billion (income x studio)
        - Competitors: health club that offers dance classes along with fitness programs
- Market size over years
    - https://www.statista.com/statistics/1175824/dance-studio-industry-market-size-us/
        - 2021 is forcast (\\$3.72 billion)
        - ```dance_studio```
- Profit
    - Assumptions
        - Total revenue of all studios is equal to the market size, \\$3.72 billion
        - Total profit of all studios is 
        \\$43k (owner's annual income) * 54.6k (the number of studios) = \\$2.35 billon
    - **Average profit** in percent: Profit/Revenue = 2.35/3.72 = **63\%** in 2021
        

### Fitness/health club in the U.S.
- Revenue statistics
    - https://www.statista.com/statistics/605223/us-fitness-health-club-market-size-2007-2021/
        - 2021 is forcast (\\$3.72 billion)
        - ```fitness```
    - https://fred.stlouisfed.org/series/REVEF71394ALLEST
        - 2021 is empty
        - ```fitness_fred```

### Performing art in the U.S.
- Revenue statistics
    - https://fred.stlouisfed.org/series/REV7111AMSA
        - ```perform_fred```

- Statistics for 2011
    - https://www.arts.gov/sites/default/files/102.pdf
        - 2011년 기준
        - 8,840 organizations
        - 127,648 paid workers
        - \$13.6 billion revenue

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pd.options.display.max_colwidth = 200

# Clean and wrangle dataset

### Revenue data

In [5]:
rev = pd.read_csv('data/rev.csv') # Manually organized yearly revenue data

rev['year']=pd.to_numeric(rev.DATE.str[:4])
# Set everythin in billion dollors
rev['fitness_fred'] = rev['fitness_fred']/1000.
rev['perform_fred'] = rev['perform_fred']/1000.
rev.drop(['DATE'],inplace=True,axis=1)

display(rev)

Unnamed: 0,fitness_fred,perform_fred,dance_studio,fitness,year
0,10.797,,,,1998
1,11.777,,,,1999
2,12.543,,,,2000
3,13.542,,,,2001
4,14.987,,,,2002
5,16.287,,,,2003
6,17.174,,,,2004
7,18.286,,,,2005
8,19.447,,,,2006
9,21.416,,,,2007
