# Header 1

The following document is my analysis on startup funding rounds

## Header 2

- Objective is to understand the dynamics between funding rounds
- Identify trends in Series A funding vs Angel round funding
- Compare the funding size across cities


# Fundraising

The following document is my analysis on startup funding rounds

## Header 2

- Objective is to understand the dynamics between funding rounds
- Identify trends in **Series A funding** vs **Angel round funding**
- Compare the funding size across cities


### Guide

Hi class, for this evening we'll spend 40 mins doing hands-on work, and your task is to assume the role of an investor / venture capital analyzing fundraising records of startups (data by techcrunch). The dataset is `techcrunch.csv`, in your `data_input` folder. Here are the questions to guide your analysis:

1. How many rows / columns do we have in the dataset?
2. How many startup funding has there been in California vs that of New York? (tip: value_counts())
3. Using describe(), what is the average size of funding (raised amount)?
4. Using describe(), what is the standard deviation in the size of funding (raised amount)?
5. What is the largest fundraising in the database? In which company was that? 

You'll do this together and add a few more techniques under our belt as a revision. We will all start with a blank notebook and starting from scratch (import libraries, read_csv, and then work through the questions) we'll build up our analysis notebook sequentially.

See you this evening!

---

## Reading the data

In [1]:
import pandas as pd
print(pd.__version__)

0.24.2


In [2]:
tc = pd.read_csv("data_input/techcrunch.csv")
tc.tail()

Unnamed: 0,permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round
1455,trusera,Trusera,15.0,web,Seattle,WA,1-Jun-07,2000000,USD,angel
1456,alerts-com,Alerts.com,,web,Bellevue,WA,8-Jul-08,1200000,USD,a
1457,myrio,Myrio,75.0,software,Bothell,WA,1-Jan-01,20500000,USD,unattributed
1458,grid-networks,Grid Networks,,web,Seattle,WA,30-Oct-07,9500000,USD,a
1459,grid-networks,Grid Networks,,web,Seattle,WA,20-May-08,10500000,USD,b


In [52]:
tc['First Name'].describe()

count     1460
unique       9
top          a
freq       582
Name: round, dtype: object

### Understanding the shape of data

In [7]:
print(f'The dimension of the data is {tc.shape}')
print(f'Size of data is: {tc.size}')

The dimension of the data is (1460, 10)
Size of data is: 14600


### City Analysis

The top 3 cities with the most amount of startup funding are, in descending order, San Francisco, New York, and Mountain View:

In [20]:
tc.city.value_counts().head(7)

San Francisco    228
New York          93
Mountain View     89
Palo Alto         78
Seattle           75
San Mateo         70
Redwood City      42
Name: city, dtype: int64

In [22]:
tc.raisedAmt.describe()/1000000

count      0.001460
mean      10.131488
std       18.661462
min        0.006000
25%        2.000000
50%        5.500000
75%       11.025000
max      300.000000
Name: raisedAmt, dtype: float64

In [25]:
tc.raisedAmt.max()

300000000

In [28]:
tc.loc[tc.raisedAmt == tc.raisedAmt.max(), :]

Unnamed: 0,permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round
14,facebook,Facebook,450.0,web,Palo Alto,CA,1-Oct-07,300000000,USD,c
1104,zenimax,ZeniMax,,web,Rockville,MD,1-Oct-07,300000000,USD,a


In [81]:
tc.company.head()

0       LifeLock
1       LifeLock
2       LifeLock
3    MyCityFaces
4       Flypaper
Name: company, dtype: object

In [80]:
# .iloc expects index
tc.iloc[0:4, 4:8]

Unnamed: 0,city,state,fundedDate,raisedAmt
0,Tempe,AZ,1-May-07,6850000
1,Tempe,AZ,1-Oct-06,6000000
2,Tempe,AZ,1-Jan-08,25000000
3,Scottsdale,AZ,1-Jan-08,50000


In [72]:
# .loc expects name
tc.loc[0:5, ['city']]

Unnamed: 0,permalink,company,numEmps
0,lifelock,LifeLock,
1,lifelock,LifeLock,
2,lifelock,LifeLock,
3,mycityfaces,MyCityFaces,7.0
4,flypaper,Flypaper,
5,infusionsoft,Infusionsoft,105.0


In [61]:
condition = tc.numEmps > 400
cols_to_return = ['company', 'city', 'raisedAmt']
tc.loc[condition, cols_to_return]

Unnamed: 0,company,city,raisedAmt
11,Facebook,Palo Alto,500000
12,Facebook,Palo Alto,12700000
13,Facebook,Palo Alto,27500000
14,Facebook,Palo Alto,300000000
15,Facebook,Palo Alto,40000000
16,Facebook,Palo Alto,15000000
17,Facebook,Palo Alto,100000000
68,Google,Mountain View,25000000
311,Art.com,Emeryville,30000000
318,Friendster,San Francisco,2400000


In [46]:
tc.head()

Unnamed: 0,permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round
0,lifelock,LifeLock,,web,Tempe,AZ,1-May-07,6850000,USD,b
1,lifelock,LifeLock,,web,Tempe,AZ,1-Oct-06,6000000,USD,a
2,lifelock,LifeLock,,web,Tempe,AZ,1-Jan-08,25000000,USD,c
3,mycityfaces,MyCityFaces,7.0,web,Scottsdale,AZ,1-Jan-08,50000,USD,seed
4,flypaper,Flypaper,,web,Phoenix,AZ,1-Feb-08,3000000,USD,a


In [50]:
isinstance(tc['company'], pd.Series)

True

## [Startup Investing Insight] Hands-on Challenge: Day 3

1. Compare: `sum(tc.numEmps)` to `tc.numEmps.sum()`: what's the difference? 
2. Use the square bracket indexing `dat['col_selected']` and compute a frequency table (`.value_counts`) on the `raisedCurrency` column. Are there any other currency apart from USD, used in the fundraising dataset?
3. Perform a conditional subsetting (boolean indexing) using the syntax: `dat[cond1]` or `dat.loc[cond1,:]` and return the rows where `round` equals to `seed`. Chain it with `.tail()` so as not to print the full returned result 
4. Create a condition, then use the condition to subset for rows where `company` is `Tesla Motors`. Pass this condition the way you did in (3) to perform the boolean indexing, but return only the following columns: `round`, `company`, `raisedCurrency` and `raisedAmt`. Use `.loc` so you can specify **column selection by label**.
5. Use `.iloc` to select the first 10 rows and only the first 5 columns in the DataFrame.
6. Go back to (4), on the resulting DataFrame, chain the `.sort_values('round')` method at the end to sort the data frame by the values in `round`.

Optional: Download Visual Studio Code and Flask installed

In [162]:
cond1 = tc['company'] == 'Tesla Motors' 
tc.loc[cond1, ["company", "round",  "raisedCurrency", "raisedAmt"]].sort_values(
    ['round', 'raisedAmt'])

Unnamed: 0,company,round,raisedCurrency,raisedAmt
538,Tesla Motors,a,USD,7500000
539,Tesla Motors,b,USD,13000000
536,Tesla Motors,c,USD,40000000
537,Tesla Motors,d,USD,45000000
540,Tesla Motors,e,USD,40000000


In [137]:
rows_select = tc['company'] == 'Facebook'
tc.loc[rows_select, ['numEmps', 'city', 'raisedAmt']]

Unnamed: 0,numEmps,city,raisedAmt
11,450.0,Palo Alto,500000
12,450.0,Palo Alto,12700000
13,450.0,Palo Alto,27500000
14,450.0,Palo Alto,300000000
15,450.0,Palo Alto,40000000
16,450.0,Palo Alto,15000000
17,450.0,Palo Alto,100000000
