# **Creating, Reading and Writing**
You can't work with data if you can't read it. Get started here.

To use pandas, you'll typically start with the following line of code.

In [1]:
import pandas as pd

##**Creating data**

There are two core objects in pandas: the **DataFrame** and the **Series**.



###**DataFrame**

A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

In [4]:
pd.DataFrame({'YES': [5, 4], 'NO': [69, 420]})

Unnamed: 0,YES,NO
0,5,69
1,4,420


we can dataframe like this way too!

In [26]:
fruits = pd.DataFrame([[30, 21]], columns= ['Apples', 'Bananas'])
fruits

Unnamed: 0,Apples,Bananas
0,30,21


DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings:

In [7]:
pd.DataFrame({'Name': ['Shanto', 'Sakib', 'Sadek', 'Monju', 'Billal', 'Rafi', 'Ferose', 'Robiul'], 
              'Address': ['Munshigonj', 'Cumilla', 'Cumilla', 'Cumilla', 'Narail','Cumilla','Gaibandha', 'Narshingdi']})

Unnamed: 0,Name,Address
0,Shanto,Munshigonj
1,Sakib,Cumilla
2,Sadek,Cumilla
3,Monju,Cumilla
4,Billal,Narail
5,Rafi,Cumilla
6,Ferose,Gaibandha
7,Robiul,Narshingdi


We are using the pd.DataFrame() constructor to generate these DataFrame objects. The syntax for declaring a new one is a dictionary whose keys are the column names (Name and Address in this example), and whose values are a list of entries. This is the standard way of constructing a new DataFrame, and the one you are most likely to encounter.

The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.

The list of row labels used in a DataFrame is known as an **Index**. We can assign values to it by using an index parameter in our constructor:

In [8]:
pd.DataFrame({'Name': ['Shanto', 'Sakib', 'Sadek', 'Monju', 'Billal', 'Rafi', 'Ferose', 'Robiul'], 
              'Address': ['Munshigonj', 'Cumilla', 'Cumilla', 'Cumilla', 'Narail','Cumilla','Gaibandha', 'Narshingdi']},
             index = ['COO', 'Posts', 'Production', 'Designer', 'Designer', 'Designer', 'Legal', 'Legal'])

Unnamed: 0,Name,Address
COO,Shanto,Munshigonj
Posts,Sakib,Cumilla
Production,Sadek,Cumilla
Designer,Monju,Cumilla
Designer,Billal,Narail
Designer,Rafi,Cumilla
Legal,Ferose,Gaibandha
Legal,Robiul,Narshingdi


In [27]:
pd.DataFrame([['Shanto', 'Munshigonj'], ['Sakib', 'Cumilla'], ['Sadek', 'Cumilla'], ['Monju', 'Cumilla'], ['Billal', 'Narail'], ['Rafi', 'Cumilla'], ['Ferose', 'Gaibandha'], ['Robiul', 'Narshingdhi']],
             columns= ['Name', 'Address'],
             index = ['COO', 'Posts', 'Production', 'Designer', 'Designer', 'Designer', 'Legal', 'Legal'])

Unnamed: 0,Name,Address
COO,Shanto,Munshigonj
Posts,Sakib,Cumilla
Production,Sadek,Cumilla
Designer,Monju,Cumilla
Designer,Billal,Narail
Designer,Rafi,Cumilla
Legal,Ferose,Gaibandha
Legal,Robiul,Narshingdhi


###**Series**
A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

In [9]:
pd.Series([1, 3, 5, 7, 9])

0    1
1    3
2    5
3    7
4    9
dtype: int64

A Series is, in essence, a single column of a DataFrame. So you can assign row labels to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name:

In [10]:
pd.Series([100, 300, 400],
          index = ['July', 'August', 'September'],
          name = 'Amino Case')

July         100
August       300
September    400
Name: Amino Case, dtype: int64

In [29]:
quantities = ['4 cups', '1 cup', '2 large', '1 can']
items = ['Flour', 'Milk', 'Eggs', 'Spam']
recipe = pd.Series(quantities, index=items, name='Dinner')
recipe

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object

##**Reading Data Files**
Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file. When you open a CSV file you get something that looks like this:

Product A,Product B,Product C,

30,21,9,

35,34,1,

41,11,11

So a CSV file is a table of values separated by commas. Hence the name: "Comma-Separated Values", or CSV.

In [12]:
wine_reviews = pd.read_csv("winemag-data-130k-v2.csv")

We can use the shape attribute to check how large the resulting DataFrame is:

In [14]:
wine_reviews.shape

(129971, 14)

We can examine the contents of the resultant DataFrame using the head() command, which grabs the first five rows:

In [15]:
wine_reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


The pd.read_csv() function is well-endowed, with over 30 optional parameters you can specify. For example, you can see in this dataset that the CSV file has a built-in index, which pandas did not pick up on automatically. To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an index_col.

In [16]:
wine_reviews = pd.read_csv("winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


##**Writing Files**

In [33]:
animals = pd.DataFrame([[12,21], [20,19]],
                       columns= ['Cows','Goats'],
                        index= ['1st year', '2nd year'])
animals

Unnamed: 0,Cows,Goats
1st year,12,21
2nd year,20,19


In [34]:
animals.to_csv('cows_and_goats.csv')