Importing the Pandas library

In [None]:
import pandas as pd

Let's look at a dataset containing information about Indian startups from 1984 to 2022. We will learn to query this dataset to extract information that we desire.

In [None]:
df = pd.read_csv('indian_startups.csv')
df.head()

In [None]:
df.info()

In [None]:
df.describe()

Let's first try to find the startups that were founded in Bengaluru after 2015, with 5 or more investors, and with a funding amount of greater than $50 million.

In [None]:
df.columns

In [None]:
df[
    (df['Starting Year'] > 2015) &
    (df['City'] == 'Bengaluru') &
    (df['No. of Investors'] >= 5) &
    (df['Funding Amount in $'] > 50000000)
]

Now, let's try to find the startups in Mumbai which mention 'machine learning' in their 'Industries' column. We will ignore case while filtering these.

In [None]:
df.columns

In [None]:
df[(df['Industries'].str.contains('Machine Learning', case = False)) & (df['City'] == 'Mumbai')]

In [None]:
# We can verify this by checking out one of the selected rows
df.loc[237, 'Industries']

Let's find the companies which have more than 100 employees and fewer than 5 funding rounds. This would give us a good idea of companies which are growing well with limited funding.

In [None]:
df.info()

In [None]:
df[(df['No. of Employees'].isin(['101-250', '251-500', '501-1000', '1001-5000', '5000+'])) & (df['Funding Round'] < 5)]

Let's find the startups in Mumbai founded before 2013, and display only the columns for company name, funding amount, and the year of founding.

In [None]:
df[(df['City'] == 'Mumbai') & (df['Starting Year'] < 2013)][['Company', 'Funding Amount in $', 'Starting Year']]

In [None]:
df.loc[
    (df['City'] == 'Mumbai') & (df['Starting Year'] < 2013),
    ['Company', 'Funding Amount in $', 'Starting Year']
]

Let's view the first 10 companies in our dataset and display only their names, cities, and funding amounts

In [None]:
df.iloc[:10][['Company', 'City', 'Funding Amount in $']]

Alternatively, we could do this using a single `.loc[]` as follows:

In [None]:
df.loc[df.index < 10, ['Company', 'City', 'Funding Amount in $']]

In [None]:
df.loc[:9, ['Company', 'City', 'Funding Amount in $']]

How could we find the startups with more than a single founder which have received over $1 billion in funding?

In [None]:
df[
    (df['Founders'].str.contains(',')) &  # if the 'Founders' column contains commas, it indicates more than one founder
    (df['Funding Amount in $'] > 1000000000)
]