![](https://entrackr.com/wp-content/uploads/2019/10/startup-image-01__1506587489_150.242.73.142-1200x600.jpg)

Coming up with brilliant startup ideas may feel tricky to aspiring entrepreneurs, especially when it may seem that everyone’s already swooped up every good idea for a business. Still, it’s entirely possible to become successful by improving on existing products or putting a unique spin on an old idea. The benefits of self-employment can make the effort of launching a startup worth it. In addition to the freedom that comes from being your own boss, starting a business brings more independence, greater job satisfaction, and potentially uncapped earning potential.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
pd.options.mode.chained_assignment = None
pd.set_option('display.max_column',None)
df = pd.read_csv('/kaggle/input/startup-investments-crunchbase/investments_VC.csv',encoding = 'unicode_escape')
df.head()

In [None]:
df.shape

In [None]:
df.describe()

**Observations:**
* There are companies from 1902 to 2014.
* 75% of the companies are very new and were established after 2006.

### Missing Values

In [None]:
perct_missing_values = df.isnull().sum()*100 / len(df)
perct_missing_values.sort_values()

## Visualisation

### Founding Year

In [None]:
def year_group(row):
    if row['founded_year'] >= 1900 and row['founded_year'] <= 1925:
        row['founded_year_group'] = 'less_than_1925'
    elif row['founded_year'] > 1925 and row['founded_year'] <= 1950:
        row['founded_year_group'] = '1925_1950'
    elif row['founded_year'] > 1950 and row['founded_year'] <= 1975:
        row['founded_year_group'] = '1950_1975'
    elif row['founded_year'] > 1975 and row['founded_year'] <= 2000:
        row['founded_year_group'] = '1975_2000'
    elif row['founded_year'] > 2000:
        row['founded_year_group'] = '2000_2014'
    else:
        row['founded_year_group'] = ''
    return row['founded_year_group']

df['founded_year_group'] =  df.apply(year_group,axis =1)

In [None]:
plt.figure(figsize = (16,7))
sns.countplot(x = 'founded_year_group', data = df.dropna())
plt.show()

Majority of the startups are established after 2000.

### Startups Since 2000

In [None]:
df_new = df[df['founded_year'] >= 2000]
df_new['founded_year'] = df_new['founded_year'].astype(int)
plt.figure(figsize = (16,7))
sns.countplot(x = 'founded_year', data = df_new)
plt.show()

The number of startups are increasing exponentially every year.

### Top 10 Market Leaders

In [None]:
plt.figure(figsize=(16,7))
sns.countplot(x =' market ', data = df, order=df[' market '].value_counts().iloc[:10].index)
plt.xticks(rotation=30)
plt.show()

Most of the startups are from software and biotechnology industry.

### Country of Origin

In [None]:
plt.figure(figsize=(16,7))
g = sns.countplot(x ='country_code', data = df, order=df['country_code'].value_counts().iloc[:10].index)
plt.xticks(rotation=30)
plt.show()

Majority of the startup companies in this dataset are from USA

### Top States in USA

In [None]:
df_USA = df[(df['country_code'] =='USA')]
plt.figure(figsize=(16,7))
g = sns.countplot(x ='state_code', data = df_USA, order=df['state_code'].value_counts().iloc[:10].index)
plt.xticks(rotation=30)
plt.show()

California has maximum number of startups compared to all other states in the USA.

### Company Status

In [None]:
plt.figure(figsize = (8,8))
df.status.value_counts().plot(kind='pie',shadow=True, explode=(0, 0, 0.15), startangle=90,autopct='%1.1f%%')
plt.title('Status')
plt.show()

### Please do upvote if you like this Notebook!