## Loading the original data file (saved as a csv from the original Excel file)


Import `pandas` (imported as `pd` to save typing).

```python
import pandas as pd
ipos = pd.read_csv('ipos.csv') 
ipos.head()
```

Press _shift-enter_ to execute the code.

In [1]:
import pandas as pd
ipos = pd.read_csv('ipos.csv')
ipos.head()

Unnamed: 0,Year,Number of IPOs,% Profitable,Number of IPOs.1,% Profitable.1
0,1980,25,88%,46,70%
1,1981,82,81%,110,85%
2,1982,44,82%,33,79%
3,1983,194,68%,257,86%
4,1984,52,81%,121,84%


### Rename columns

Rename column names (Excel spreadsheet didn't note which columns were Tech & Biotech and which were Other IPOs.

```python
ipos = ipos.rename(columns={
        'Year': 'year',
        'Number of IPOs': 'num_ipos_tech',
        '% Profitable': 'percentprof_tech', 
        'Number of IPOs.1': 'num_ipos_other', 
        '% Profitable.1': 'percentprof_other', 
    })
ipos.head()
```

In [2]:
ipos = ipos.rename(columns={
        'Year': 'year',
        'Number of IPOs': 'num_ipos_tech',
        '% Profitable': 'percentprof_tech', 
        'Number of IPOs.1': 'num_ipos_other', 
        '% Profitable.1': 'percentprof_other', 
    })
ipos.head()

Unnamed: 0,year,num_ipos_tech,percentprof_tech,num_ipos_other,percentprof_other
0,1980,25,88%,46,70%
1,1981,82,81%,110,85%
2,1982,44,82%,33,79%
3,1983,194,68%,257,86%
4,1984,52,81%,121,84%


### Clean data

Remove percentage signs in the % profitable columns and convert the strings to floats, so we just have numbers to work with (the column descriptions also already describe the format of the data).

```python
ipos['percentprof_tech'] = ipos['percentprof_tech'].map(lambda x: str(x)[:-1]).astype(float)
ipos['percentprof_other'] = ipos['percentprof_other'].map(lambda x: str(x)[:-1]).astype(float)
ipos.head()
```

In [3]:
ipos['percentprof_tech'] = ipos['percentprof_tech'].map(lambda x: str(x)[:-1]).astype(float)
ipos['percentprof_other'] = ipos['percentprof_other'].map(lambda x: str(x)[:-1]).astype(float)
ipos.head()

Unnamed: 0,year,num_ipos_tech,percentprof_tech,num_ipos_other,percentprof_other
0,1980,25,88,46,70
1,1981,82,81,110,85
2,1982,44,82,33,79
3,1983,194,68,257,86
4,1984,52,81,121,84


### Summary statistics
`describe` function shows basic summary stats for every numeric column in the `DataFrame`

```python
ipos.describe()
```

In [4]:
ipos.describe()

Unnamed: 0,year,num_ipos_tech,percentprof_tech,num_ipos_other,percentprof_other
count,35.0,35.0,35.0,35.0,35.0
mean,1997.0,102.542857,50.914286,127.742857,74.085714
std,10.246951,92.089261,23.243432,100.41395,10.293834
min,1980.0,7.0,11.0,14.0,50.0
25%,1988.5,40.5,31.0,48.5,69.5
50%,1997.0,72.0,54.0,82.0,75.0
75%,2005.5,132.0,69.5,199.5,84.0
max,2014.0,382.0,88.0,355.0,88.0


### Computing number of profitable IPOs for each group

Add a column reflecting the actual number of profitable IPOs for (1) Tech & Biotech and (2) Other.

```python
ipos['num_prof_tech'] = ( (ipos['percentprof_tech'] / 100) * ipos['num_ipos_tech'] )
ipos['num_prof_other'] = ( (ipos['percentprof_other'] / 100) * ipos['num_ipos_other'] )
ipos.head()
```

In [5]:
ipos['num_prof_tech'] = ( (ipos['percentprof_tech'] / 100) * ipos['num_ipos_tech'] )
ipos['num_prof_other'] = ( (ipos['percentprof_other'] / 100) * ipos['num_ipos_other'] )
ipos.head()

Unnamed: 0,year,num_ipos_tech,percentprof_tech,num_ipos_other,percentprof_other,num_prof_tech,num_prof_other
0,1980,25,88,46,70,22.0,32.2
1,1981,82,81,110,85,66.42,93.5
2,1982,44,82,33,79,36.08,26.07
3,1983,194,68,257,86,131.92,221.02
4,1984,52,81,121,84,42.12,101.64
