# Pandas ```Series``` v. ```DataFrames```

Before we can analyze data, we need to learn a few more fundamentals aspects of data in  Pandas.

Today, you'll learn to:

- differentiate between a Pandas series and dataframe;
- slice a dataframe
- store a particular value in a dataframe into a variable

In [1]:
## import pandas
import pandas as pd

## reading urls with ```read_csv()```

simply put, you can read urls by running ```pd.read_csv("http://some_url.com")```

Try it with <a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/diamonds.csv">this url</a>

In [3]:
## read data url

df = pd.read_csv ("https://raw.githubusercontent.com/sandeepmj/datasets/main/diamonds.csv")
df

Unnamed: 0,carat,diamond cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.20,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75
...,...,...,...,...,...,...,...,...,...,...
53935,0.72,Ideal,D,SI1,60.8,57.0,2757,5.75,5.76,3.50
53936,0.72,Good,D,SI1,63.1,55.0,2757,5.69,5.75,3.61
53937,0.70,Very Good,D,SI1,62.8,60.0,2757,5.66,5.68,3.56
53938,0.86,Premium,H,SI2,61.0,58.0,2757,6.15,6.12,3.74


In [4]:
##A series is a one dimensional item
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53940 entries, 0 to 53939
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   carat        53940 non-null  float64
 1   diamond cut  53940 non-null  object 
 2   color        53940 non-null  object 
 3   clarity      53940 non-null  object 
 4   depth        53940 non-null  float64
 5   table        53940 non-null  float64
 6   price        53940 non-null  int64  
 7   x            53940 non-null  float64
 8   y            53940 non-null  float64
 9   z            53940 non-null  float64
dtypes: float64(6), int64(1), object(3)
memory usage: 4.1+ MB


In [5]:
df.carat

0        0.23
1        0.21
2        0.23
3        0.29
4        0.31
         ... 
53935    0.72
53936    0.72
53937    0.70
53938    0.86
53939    0.75
Name: carat, Length: 53940, dtype: float64

In [6]:
df.carat * 2

0        0.46
1        0.42
2        0.46
3        0.58
4        0.62
         ... 
53935    1.44
53936    1.44
53937    1.40
53938    1.72
53939    1.50
Name: carat, Length: 53940, dtype: float64

In [7]:
##Better to use square bracket notation because it reads spaces
df["carat"]

0        0.23
1        0.21
2        0.23
3        0.29
4        0.31
         ... 
53935    0.72
53936    0.72
53937    0.70
53938    0.86
53939    0.75
Name: carat, Length: 53940, dtype: float64

In [10]:
df["diamond cut"]

0            Ideal
1          Premium
2             Good
3          Premium
4             Good
           ...    
53935        Ideal
53936         Good
53937    Very Good
53938      Premium
53939        Ideal
Name: diamond cut, Length: 53940, dtype: object

In [11]:
df["diamond cut"] == "Premium"

0        False
1         True
2        False
3         True
4        False
         ...  
53935    False
53936    False
53937    False
53938     True
53939    False
Name: diamond cut, Length: 53940, dtype: bool

In [12]:
df["diamond cut"].unique()

array(['Ideal', 'Premium', 'Good', 'Very Good', 'Fair'], dtype=object)

In [15]:
df["diamond cut"].value_counts().to_frame("number")

Unnamed: 0_level_0,number
diamond cut,Unnamed: 1_level_1
Ideal,21551
Premium,13791
Very Good,12082
Good,4906
Fair,1610
