# DATAFRAME
---

A DataFrame is a 2-dimensional data structure of rows and columns, similar to a spreadsheet.

## Creating DataFrame from Lists

You can create a DataFrame by adding columns in the data structure. Each column is created from a collection of data represented as a list.

In [4]:
import pandas as pd

In [5]:
#creating an empty DataFrame
company = pd.DataFrame()

# creating data collections as lists
company_months = ['January','February','March']
company_income = [23500,19700,31150]

# adding columns to DataFrame
company['Month'] = company_months
company['Income'] = company_income

# displaying DataFrame contents
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150


### Tasks

Complete the data by adding the remaining months of the first half of the year and the income earned in these months. Then, display the data again.

In [9]:
import pandas as pd

company = pd.DataFrame()

# creating data collections as lists
company_months = ['January','February','March']
company_income = [23500,19700,31150]

# adding columns to DataFrame
company['Month'] = company_months
company['Income'] = company_income

# displaying DataFrame contents
company

# Complete the data by adding the remaining months of the first half of the year and the income earned in these months.
remaining_months = ['April', 'May', 'June']
remaining_income = [25000, 21000, 28000]

# Add the new data to the DataFrame
company = pd.concat([company, pd.DataFrame({'Month': remaining_months, 'Income': remaining_income})], ignore_index=True)

# Display the updated DataFrame
company


Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150
3,April,25000
4,May,21000
5,June,28000


Display descriptive statistics for income earned.

In [10]:
company.describe()


Unnamed: 0,Income
count,6.0
mean,24725.0
std,4306.013237
min,19700.0
25%,21625.0
50%,24250.0
75%,27250.0
max,31150.0


Display company income for the months of the second quarter.

> Add blockquote



In [11]:
second_quarter = company[company['Month'].isin(['April', 'May', 'June'])]
second_quarter


Unnamed: 0,Month,Income
3,April,25000
4,May,21000
5,June,28000


Display descriptive statistics for the income earned in the months of the second quarter.

In [12]:

second_quarter.describe()


Unnamed: 0,Income
count,3.0
mean,24666.666667
std,3511.884584
min,21000.0
25%,23000.0
50%,25000.0
75%,26500.0
max,28000.0


## Creating DataFrame from 2D List

Instead of adding each column separately, you can create a DateFrame based on a two-dimensional (2D) list. Note that you will then need to add names to the columns you create.

In [None]:
# creating data collection as 2D list
company_data = [
    ['January',23500],
    ['February',19700],
    ['March',31150]
    ]

# creating DataFrame with column names
company = pd.DataFrame(data=company_data, columns=['Month','Income'])

# displaying DataFrame contents
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150


### Tasks

The table below lists the university's students.

StudentID | Name        | Surname      | Age | Program
----------|-------------|--------------|-----|-----------
902311    | Peter       | Red          | 21  | Accounting   
915027    | Sofia       | White        | 19  | Computer Science
900004    | Jack        | Grey         | 24  | Accounting
994031    | Mark        | Brown        | 22  | Engineering         

Create a DataFrame using a 2D list. Then, display the contents of the DataFrame.

Calculate and display the average age of students.

In [13]:
srudenci= pd.DataFrame()
srudenciID=[902311,915027,900004,994031]
srudenciName=['Peter','Sofia','Jack','Mark']
srudenciSurname=['Red','White','Grey','Brown']
srudenciAge=[21,19,24,22]
srudenciProgram=['Accounting','Computer Science','Accounting','Engineering']
srudenci['StudentID']=srudenciID
srudenci['Name']=srudenciName
srudenci['Surname']=srudenciSurname
srudenci['Age']=srudenciAge
srudenci['Program']=srudenciProgram
srudenci
srudenci['Age'].mean()

21.5

## Creating DataFrame from Dictionary

As you know, a dictionary contains data consisting of key and value pairs of information, separated by a colon. Each pair of information represents one column in the DataFrame. The key is the name of the column and the value is the data collection (list). Below is an example of creating a DataFrame based on a dictionary.

In [14]:
# creating data collection as a dictionary
company_data = {
    'Month':['January','February','March'],
    'Income':[23500,19700,31150]
    }

#creating DataFrame
company = pd.DataFrame(data=company_data)

#displaying DataFrame contents
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150


### Tasks

Complete the DataFrame by adding a 'Tax' column along with the following values: 1200, 2350, 995. Then, display DataFrame contents.

Display descriptive statistics for the company income and tax.

## Creating DataFrame from File

Creating a DataFrame based on the data contained in a CSV file is incredibly simple. All you need to do is use the read_csv() function.

In [18]:
sales = pd.read_csv('product_sales.csv')
sales

Unnamed: 0,SaleRep,Region,Orders,TotalSales
0,Felice Lunck,West,218,44489
1,Doralynn Pesak,West,233,61035
2,Madelle Martland,East,264,62603
3,Yasmin Myhan,South,110,59377
4,Marmaduke Webbe,East,188,78771
5,Christiano Vero,East,265,68506
6,Cecelia Jealous,West,93,53634
7,Isaak Housiaux,East,189,62455
8,Derril Howland,East,385,73460
9,Judon Allom,West,230,51067


### Tasks

For sales data, calculate and display the average number of orders.


In [24]:
sales['Orders'].mean()


217.5

For sales data, calculate and display the total sales value.

In [29]:
sales['TotalSales'].sum()


615397