In the following code I used SQL skills to perform various operations on the "startups" table. I counted the number of companies in the table and found the sum of the "valuation" column to know the total value of all companies. I also used the "MAX()" function to find the highest amount raised by a startup and the maximum amount raised during the "Seed" stage. I used the "MIN()" function to find the year in which the oldest company was founded.

I also used "SUM" and "AVG" aggregate functions in combination with "GROUP BY" clause to perform data analysis and generate meaningful insights into the trends and patterns in the data. I employed the "HAVING" clause to filter groups based on aggregate calculations, such as selecting only the categories with an average valuation greater than a certain threshold. I utilized "SUBQUERY" to extract data from a nested query and used "WITH" clause to create and use temporary views to simplify complex queries.

## Setup

In [1]:
import pandas as pd 

In [2]:
import sqlite3

In [3]:
df = pd.read_csv("startups.csv")

In [4]:
cnn = sqlite3.connect('jupyter_sql.db')

In [7]:
df.to_sql('startups',cnn, if_exists='replace')

In [8]:
%load_ext sql

In [9]:
%sql sqlite:///jupyter_sql.db

In [10]:
%%sql 

select * from startups

 * sqlite:///jupyter_sql.db
Done.


index,name,location,category,employees,raised,valuation,founded,stage,ceo,info
0,Pied Piper,Silicon Valley,Cloud Computing,6.0,5000000.0,50000000.0,2014,A,Richard Hendricks,A Middle-Out Compression Solution
1,Hooli,Silicon Valley,Enterprise,9000.0,580000000.0,49500000000.0,1997,,Gavin Bensen,Hooli Is About People
2,Raviga Capital,Silicon Valley,Venture Capital,12.0,300000000.0,3000000000.0,2012,,Peter Gregory,Share Only In Success
3,Aviato,Silicon Valley,Travel,3.0,250000.0,2500000.0,2006,Acquired,Erlich Bachman,Software Aggregation Program
4,SEE FOOD,Silicon Valley,Mobile,2.0,,15000000.0,2016,Acquired,Jian-Yang,The Shazam of Food
5,Forbid,New York,Mobile,25.0,1400000.0,5000000.0,2013,Acquired,Charlie Dattolo,Charge Users $10 for Calling Their Ex
6,Soulstice,New York,Fitness,300.0,30000000.0,120000000.0,2014,B,,What are your goals today?
7,E-Corp,New York,Enterprise,10000.0,,66000000000.0,2006,,Phillip Price,Together We Can Change the Wolrd
8,Allsafe Cybersecurity,New York,Security,250.0,123000000.0,1000000000.0,2014,,Gideon Goddard,
9,fsociety,Brooklyn,Games,5.0,,,2015,Stealth,Elliot Alderson,Fun Society Arcade


# Starting the code

### Calculate the total number of companies in the table.

In [11]:
%%sql
select count(*) as 'Number_of_companies'
from startups 
where name is not null;

 * sqlite:///jupyter_sql.db
Done.


Number_of_companies
70


### We want to know the total value of all companies in this table. Calculate this by getting the SUM() of the valuation column.

In [12]:
%%sql
select sum(valuation) as 'total_value_of_companies'
from startups;

 * sqlite:///jupyter_sql.db
Done.


total_value_of_companies
974455790000.0


### What is the highest amount raised by a startup? Return the maximum amount of money raised.

In [13]:
%%sql
select max(raised) as "maximun_amount_raised"
from startups;

 * sqlite:///jupyter_sql.db
Done.


maximun_amount_raised
11500000000.0


### Edit the query so that it returns the maximum amount of money raised, during ‘Seed’ stage.

In [14]:
%%sql
select max(raised) as "maximun_amount_raised"
from startups
where stage = "Seed";

 * sqlite:///jupyter_sql.db
Done.


maximun_amount_raised
1800000.0


### In what year was the oldest company on the list founded?

In [15]:
%%sql
select min(founded)
from startups;

 * sqlite:///jupyter_sql.db
Done.


min(founded)
1994


## Let's find out the valuations among different sectors:

### Return the average valuation in each category.

In [16]:
%%sql
select category, avg(valuation)
from startups
group by 1;

 * sqlite:///jupyter_sql.db
Done.


category,avg(valuation)
,4290000.0
Algorithms,7600000.0
Augmented Reality,8000000000.0
Big Data Analytics,15000000.0
Cloud Computing,95000000.0
Customer Service,640000000.0
Data Analytics,
E-commerce,60250000.0
Education,2023800000.0
Enterprise,38508333333.333336


### Return the average valuation, in each category. Round the averages to two decimal places.

In [17]:
%%sql
select category, round(avg(valuation),2)
from startups
group by 1;

 * sqlite:///jupyter_sql.db
Done.


category,"round(avg(valuation),2)"
,4290000.0
Algorithms,7600000.0
Augmented Reality,8000000000.0
Big Data Analytics,15000000.0
Cloud Computing,95000000.0
Customer Service,640000000.0
Data Analytics,
E-commerce,60250000.0
Education,2023800000.0
Enterprise,38508333333.33


### Return the average valuation, in each category. Round the averages to two decimal places. Lastly, order the list from highest averages to lowest.

In [18]:
%%sql
select category, round(avg(valuation),2) as 'valuation'
from startups
group by 1
order by 2 desc;

 * sqlite:///jupyter_sql.db
Done.


category,valuation
Health Care,380490000000.0
Enterprise,38508333333.33
Real Estate,20000000000.0
Travel,12501250000.0
Augmented Reality,8000000000.0
Security,6333333333.33
Technology,3100000000.0
Venture Capital,3000000000.0
Education,2023800000.0
Customer Service,640000000.0


## What are the most competitive markets?

### First, return the name of each category with the total number of companies that belong to it.

In [19]:
%%sql
select category, count(*) as 'number of companies'
from startups
group by 1
order by 2 desc;

 * sqlite:///jupyter_sql.db
Done.


category,number of companies
Social,12
Mobile,10
Education,5
Technology,3
Security,3
Fitness,3
Enterprise,3
E-commerce,3
,3
Virtual Reality,2


### Next, filter the result to only include categories that have more than three companies in them. What are the most competitive markets?

In [20]:
%%sql
select category, count(*) as 'number of companies'
from startups
group by 1
having count(*) >= 3
order by 2 desc;

 * sqlite:///jupyter_sql.db
Done.


category,number of companies
Social,12
Mobile,10
Education,5
Technology,3
Security,3
Fitness,3
Enterprise,3
E-commerce,3
,3


In [21]:
The most competitive markets are Social, Mobile, Education, Technology, Security, Fitness, Enterprise, E-commerce, and None.

SyntaxError: invalid syntax (<ipython-input-21-efe7604c20e4>, line 1)

## Let's see if there's a difference in startups sizes among different locations:

### What is the average size of a startup in each location?

In [22]:
%%sql
select location, avg(employees) as 'startup size'
from startups 
group by 1
order by avg(employees) desc;

 * sqlite:///jupyter_sql.db
Done.


location,startup size
San Francisco,1920.4
Silicon Valley,1804.6
New York,702.75
Brooklyn,502.6666666666667
Fort Lauderdale,500.0
New Delhi,250.0
Palo Alto,125.83333333333331
Omaha,65.0
Paris,30.0
Minneapolis,20.0


It is seems like there is a big difference 

### What is the average size of a startup in each location, with average sizes above 500?

In [23]:
%%sql
select location, avg(employees) as 'startup size'
from startups 
group by 1
having avg(employees) > 500 
order by avg(employees) desc;

 * sqlite:///jupyter_sql.db
Done.


location,startup size
San Francisco,1920.4
Silicon Valley,1804.6
New York,702.75
Brooklyn,502.6666666666667
