# The Pandas library

**From the Pandas documentation:**

**pandas** is everyone's favorite data analyis library providing fast, flexible, and expressive data structures designed to work with *relational* or table-like data (SQL table or Excel spreadsheet). It is a fundamental high-level building block for doing practical, real world data analysis in Python.

In [31]:
# The importing convention
# ! pip install openpyxl
import pandas as pd

# Opening (Reading) Files


## Reading Excel File

In [32]:
filepath = "data\stock_data_simple.xlsx"

In [33]:
df = pd.read_excel(filepath)

## Reading CSV File

In [70]:
filepath = "data\stock_data_simple.csv"

In [71]:
df = pd.read_csv(filepath)

# View/Display data

In [72]:
df.head()

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
0,WMT,Wal-Mart Stores,Retail,1/16/2014,76.76,-1.20%,248377,55688,3235772
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854
3,BAC,Bank Of America Corp,Financial,1/16/2014,17.08,-0.40%,182177,23553,10666133
4,SGL.KR,Samsung Electronics,Technology,1/16/2014,1301000.0,0.20%,180329,23444,147299


In [73]:
df.tail()

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
5,NESN.CH,Nestle 'R',Consumer Staple,1/16/2014,67.45,1.20%,239974,22584,3224798
6,MSFT,Microsoft Corp,Technology,1/16/2014,36.89,0.40%,307956,18529,8347968
7,AMZN,Amazon.Com Inc,Retail,1/16/2014,395.8,0.00%,181170,17092,457733
8,GOOG,Google Inc,Technology,1/16/2014,1156.22,0.70%,386278,14893,334087
9,PFE,Pfizer Inc,Health Care,1/16/2014,31.17,0.00%,202014,12643,6481070


In [76]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   ticker                 10 non-null     object
 1   company_name           10 non-null     object
 2   sector                 10 non-null     object
 3   trade_date             10 non-null     object
 4   price                  10 non-null     object
 5   price_change_percent   10 non-null     object
 6   market_capitalization  10 non-null     object
 7   annual_sales           10 non-null     object
 8   shares_outstanding     10 non-null     int64 
dtypes: int64(1), object(8)
memory usage: 848.0+ bytes


# The Pandas DataFrames

<img src="img/dataframe.png"><img src="img/excel_table.png">

If the structure seems familiar, it's because DataFrames are very similar to a single Excel “Sheet”, but instead of referring to rows and columns with A1, yyou have the column numbers/names and row numbers.

A DataFrame consists on three parts:

1. Index
2. Columns Names (Column Index)
3. Data

In [77]:
df.index

RangeIndex(start=0, stop=10, step=1)

In [78]:
df.columns

Index(['ticker', 'company_name', 'sector', 'trade_date', 'price',
       'price_change_percent', 'market_capitalization', 'annual_sales',
       'shares_outstanding'],
      dtype='object')

In [79]:
df.values

array([['WMT', 'Wal-Mart Stores', 'Retail', '1/16/2014', '76.76',
        '-1.20%', '248,377', '55,688', 3235772],
       ['AAPL', 'Apple Inc', 'Technology', '1/16/2014', '554.25',
        '-0.60%', '494,697', '37,472', 892553],
       ['IBM', 'Intl Business Machines', 'Technology', '1/16/2014',
        '188.76', '0.50%', '204,965', '23,720', 1085854],
       ['BAC', 'Bank Of America Corp', 'Financial', '1/16/2014', '17.08',
        '-0.40%', '182,177', '23,553', 10666133],
       ['SGL.KR', 'Samsung Electronics', 'Technology', '1/16/2014',
        '1,301,000.00', '0.20%', '180,329', '23,444', 147299],
       ['NESN.CH', "Nestle 'R'", 'Consumer Staple', '1/16/2014', '67.45',
        '1.20%', '239,974', '22,584', 3224798],
       ['MSFT', 'Microsoft Corp', 'Technology', '1/16/2014', '36.89',
        '0.40%', '307,956', '18,529', 8347968],
       ['AMZN', 'Amazon.Com Inc', 'Retail', '1/16/2014', '395.8',
        '0.00%', '181,170', '17,092', 457733],
       ['GOOG', 'Google Inc', 'Techno

## Sort Data

In [80]:
df.sort_values("market_capitalization", ascending=False)

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553
8,GOOG,Google Inc,Technology,1/16/2014,1156.22,0.70%,386278,14893,334087
6,MSFT,Microsoft Corp,Technology,1/16/2014,36.89,0.40%,307956,18529,8347968
0,WMT,Wal-Mart Stores,Retail,1/16/2014,76.76,-1.20%,248377,55688,3235772
5,NESN.CH,Nestle 'R',Consumer Staple,1/16/2014,67.45,1.20%,239974,22584,3224798
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854
9,PFE,Pfizer Inc,Health Care,1/16/2014,31.17,0.00%,202014,12643,6481070
3,BAC,Bank Of America Corp,Financial,1/16/2014,17.08,-0.40%,182177,23553,10666133
7,AMZN,Amazon.Com Inc,Retail,1/16/2014,395.8,0.00%,181170,17092,457733
4,SGL.KR,Samsung Electronics,Technology,1/16/2014,1301000.0,0.20%,180329,23444,147299


## Selecting Data (by Columns)

In [81]:
df['ticker']

0        WMT
1       AAPL
2        IBM
3        BAC
4     SGL.KR
5    NESN.CH
6       MSFT
7       AMZN
8       GOOG
9        PFE
Name: ticker, dtype: object

OR, can use column name:

In [82]:
df.ticker

0        WMT
1       AAPL
2        IBM
3        BAC
4     SGL.KR
5    NESN.CH
6       MSFT
7       AMZN
8       GOOG
9        PFE
Name: ticker, dtype: object

In [83]:
# Getting more than one column
df[['ticker', 'company_name','sector']]

Unnamed: 0,ticker,company_name,sector
0,WMT,Wal-Mart Stores,Retail
1,AAPL,Apple Inc,Technology
2,IBM,Intl Business Machines,Technology
3,BAC,Bank Of America Corp,Financial
4,SGL.KR,Samsung Electronics,Technology
5,NESN.CH,Nestle 'R',Consumer Staple
6,MSFT,Microsoft Corp,Technology
7,AMZN,Amazon.Com Inc,Retail
8,GOOG,Google Inc,Technology
9,PFE,Pfizer Inc,Health Care


# Selecting Data

## Label vs. Location

### df.loc[row_labels, column_labels] - select data (rows and/or columns) with particular label(s).

Allowed inputs are:

* An integer, e.g. 5.

* A list or array of integers, e.g. [4, 3, 0].

* A slice object with ints, e.g. 1:7.

* A boolean array.

### df.iloc[row_positions, column_positions] - select data (rows and/or columns) at integer locations.

Allowed inputs are:

* A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

* A list or array of labels, e.g. ['a', 'b', 'c'].

* A slice object with labels, e.g. 'a':'f'.

* A boolean array of the same length as the axis being sliced, e.g. [True, False, True].


### Selecting a single row by position
Selecting a single row by position

In [84]:
# selects series
df.loc[3]

ticker                                    BAC
company_name             Bank Of America Corp
sector                              Financial
trade_date                          1/16/2014
price                                   17.08
price_change_percent                   -0.40%
market_capitalization                 182,177
annual_sales                           23,553
shares_outstanding                   10666133
Name: 3, dtype: object

In [85]:
# selects series
df.iloc[3]

ticker                                    BAC
company_name             Bank Of America Corp
sector                              Financial
trade_date                          1/16/2014
price                                   17.08
price_change_percent                   -0.40%
market_capitalization                 182,177
annual_sales                           23,553
shares_outstanding                   10666133
Name: 3, dtype: object

In [86]:
# select the single row dataframe
df.loc[[3]]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
3,BAC,Bank Of America Corp,Financial,1/16/2014,17.08,-0.40%,182177,23553,10666133


### select the last row of the data frame - input: integer - output: series

In [87]:
df.iloc[-1]

ticker                           PFE
company_name              Pfizer Inc
sector                   Health Care
trade_date                 1/16/2014
price                          31.17
price_change_percent           0.00%
market_capitalization        202,014
annual_sales                  12,643
shares_outstanding           6481070
Name: 9, dtype: object

In [88]:
# select the last row of the data frame - input: list - output: data frame
df.iloc[[-1]]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
9,PFE,Pfizer Inc,Health Care,1/16/2014,31.17,0.00%,202014,12643,6481070


# Selecting multiple rows by position
To extract multiple rows by position, we pass either a list or a slice object to the .iloc[] indexer.

#### df.loc[row_labels, column_labels]

#### df.iloc[row_positions, column_positions]


By integer slices, acting similar to numpy/Python:

In [89]:
# Selecting rows and columns simultaneously
rows_to_select = [2, 3, 6, 7]

# using loc
df.loc[rows_to_select]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854
3,BAC,Bank Of America Corp,Financial,1/16/2014,17.08,-0.40%,182177,23553,10666133
6,MSFT,Microsoft Corp,Technology,1/16/2014,36.89,0.40%,307956,18529,8347968
7,AMZN,Amazon.Com Inc,Retail,1/16/2014,395.8,0.00%,181170,17092,457733


In [90]:
# we can select
df['ticker'].loc[rows_to_select]

2     IBM
3     BAC
6    MSFT
7    AMZN
Name: ticker, dtype: object

In [91]:
# using iloc method
df.iloc[rows_to_select]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854
3,BAC,Bank Of America Corp,Financial,1/16/2014,17.08,-0.40%,182177,23553,10666133
6,MSFT,Microsoft Corp,Technology,1/16/2014,36.89,0.40%,307956,18529,8347968
7,AMZN,Amazon.Com Inc,Retail,1/16/2014,395.8,0.00%,181170,17092,457733


In [92]:
# select the first five rows of the dataframe using slice notation
df.iloc[0:5]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
0,WMT,Wal-Mart Stores,Retail,1/16/2014,76.76,-1.20%,248377,55688,3235772
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854
3,BAC,Bank Of America Corp,Financial,1/16/2014,17.08,-0.40%,182177,23553,10666133
4,SGL.KR,Samsung Electronics,Technology,1/16/2014,1301000.0,0.20%,180329,23444,147299


In [93]:
# Selecting a single row and multiple columns
# select the name, surname, and salary of the employee with id number 478 by position
df.iloc[1, [0, 1, 3]]

# select the name, surname, and salary of the employee with id number 478 by label
df.loc[4, ['ticker', 'sector', 'price']]


ticker          SGL.KR
sector      Technology
price     1,301,000.00
Name: 4, dtype: object

### For getting a value (cell) explicitly:


In [94]:
df.iloc[2, 1]

'Intl Business Machines'

For getting fast access to a scalar (equivalent to the prior method):

In [95]:
df.iat[2, 1]

# For slicing rows explicitly:
df.iloc[1:3, :]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854


In [96]:
# The .iloc[] indexer is used to index a data frame by position.
df.iloc[2]

ticker                                      IBM
company_name             Intl Business Machines
sector                               Technology
trade_date                            1/16/2014
price                                    188.76
price_change_percent                      0.50%
market_capitalization                   204,965
annual_sales                             23,720
shares_outstanding                      1085854
Name: 2, dtype: object

In [97]:
# To extract multiple rows by position, we pass either a list or a slice object to the .iloc[] indexer.
df.iloc[[2]]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854


In [98]:
df.iloc[rows_to_select]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854
3,BAC,Bank Of America Corp,Financial,1/16/2014,17.08,-0.40%,182177,23553,10666133
6,MSFT,Microsoft Corp,Technology,1/16/2014,36.89,0.40%,307956,18529,8347968
7,AMZN,Amazon.Com Inc,Retail,1/16/2014,395.8,0.00%,181170,17092,457733


In [99]:
# Getting a single value
df.loc[1,'company_name']

'Apple Inc'

See more at Selection by Label.

* df.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with Python/NumPy slice semantics). Allowed inputs are:

    * An integer e.g. 5.

    * A list or array of integers [4, 3, 0].

    * A slice object with ints 1:7.

    * A boolean array (any NA values will be treated as False).

    * A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).

It is also possible to select by position using the *iloc* method

## Describe data quick statistical summary

In [100]:
df.describe()

Unnamed: 0,shares_outstanding
count,10.0
mean,3487327.0
std,3756962.0
min,147299.0
25%,566438.0
50%,2155326.0
75%,5669746.0
max,10666130.0


## Transpose your data

In [101]:
df.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
ticker,WMT,AAPL,IBM,BAC,SGL.KR,NESN.CH,MSFT,AMZN,GOOG,PFE
company_name,Wal-Mart Stores,Apple Inc,Intl Business Machines,Bank Of America Corp,Samsung Electronics,Nestle 'R',Microsoft Corp,Amazon.Com Inc,Google Inc,Pfizer Inc
sector,Retail,Technology,Technology,Financial,Technology,Consumer Staple,Technology,Retail,Technology,Health Care
trade_date,1/16/2014,1/16/2014,1/16/2014,1/16/2014,1/16/2014,1/16/2014,1/16/2014,1/16/2014,1/16/2014,1/16/2014
price,76.76,554.25,188.76,17.08,1301000.00,67.45,36.89,395.8,1156.22,31.17
price_change_percent,-1.20%,-0.60%,0.50%,-0.40%,0.20%,1.20%,0.40%,0.00%,0.70%,0.00%
market_capitalization,248377,494697,204965,182177,180329,239974,307956,181170,386278,202014
annual_sales,55688,37472,23720,23553,23444,22584,18529,17092,14893,12643
shares_outstanding,3235772,892553,1085854,10666133,147299,3224798,8347968,457733,334087,6481070


## Answering simple questions about a dataset

### How many companies are there by Sector?

In [102]:
df['sector'].value_counts()

Technology         5
Retail             2
Consumer Staple    1
Health Care        1
Financial          1
Name: sector, dtype: int64

### What is the average Market Capitalization?

In [None]:
df['market_capitalization'] = df['market_capitalization'].str.replace(",","").astype(int)

In [112]:
df['market_capitalization'].mean()

262793.7

### What is the most frequent sector?

In [113]:
df['sector'].describe()

count             10
unique             5
top       Technology
freq               5
Name: sector, dtype: object

## Who are the 5 largest companies?

In [114]:
df.sort_values('market_capitalization', ascending=False)[:5]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553
8,GOOG,Google Inc,Technology,1/16/2014,1156.22,0.70%,386278,14893,334087
6,MSFT,Microsoft Corp,Technology,1/16/2014,36.89,0.40%,307956,18529,8347968
0,WMT,Wal-Mart Stores,Retail,1/16/2014,76.76,-1.20%,248377,55688,3235772
5,NESN.CH,Nestle 'R',Consumer Staple,1/16/2014,67.45,1.20%,239974,22584,3224798


# Boolean Indexing
Boolean indexing
Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not.
These must be grouped by using parentheses, since by default Python will evaluate an expression such as df['A'] > 2 & df['B'] < 3 as df['A'] > (2 & df['B']) < 3, while the desired evaluation order is (df['A'] > 2) & (df['B'] < 3).

In [116]:
df[df.market_capitalization > 425010]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553


In [117]:
#  exclude COMPUTER sector
df[~(df.sector == "Technology")]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
0,WMT,Wal-Mart Stores,Retail,1/16/2014,76.76,-1.20%,248377,55688,3235772
3,BAC,Bank Of America Corp,Financial,1/16/2014,17.08,-0.40%,182177,23553,10666133
5,NESN.CH,Nestle 'R',Consumer Staple,1/16/2014,67.45,1.20%,239974,22584,3224798
7,AMZN,Amazon.Com Inc,Retail,1/16/2014,395.8,0.00%,181170,17092,457733
9,PFE,Pfizer Inc,Health Care,1/16/2014,31.17,0.00%,202014,12643,6481070


Indexing with isin

In [118]:
df.ticker

0        WMT
1       AAPL
2        IBM
3        BAC
4     SGL.KR
5    NESN.CH
6       MSFT
7       AMZN
8       GOOG
9        PFE
Name: ticker, dtype: object

In [119]:
df.ticker.isin(["AAPL", "FB", "WBK", "DTEX.DE", "FOXA", "MSFT"])

0    False
1     True
2    False
3    False
4    False
5    False
6     True
7    False
8    False
9    False
Name: ticker, dtype: bool

In [120]:
df[df.ticker.isin(["AAPL", "FB", "WBK", "DTEX.DE", "FOXA", "MSFT"])]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553
6,MSFT,Microsoft Corp,Technology,1/16/2014,36.89,0.40%,307956,18529,8347968


## Further questions

In [121]:
df.columns

Index(['ticker', 'company_name', 'sector', 'trade_date', 'price',
       'price_change_percent', 'market_capitalization', 'annual_sales',
       'shares_outstanding'],
      dtype='object')

### Give me the list of the companies in the Technology Sector

In [122]:
df['sector'] == 'Technology'

0    False
1     True
2     True
3    False
4     True
5    False
6     True
7    False
8     True
9    False
Name: sector, dtype: bool

We can use a boolean series to index a Series or a DataFrame, this is called "Masking" or boolean indexing.

In [123]:
df.loc[df['sector'] == 'Technology']

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854
4,SGL.KR,Samsung Electronics,Technology,1/16/2014,1301000.0,0.20%,180329,23444,147299
6,MSFT,Microsoft Corp,Technology,1/16/2014,36.89,0.40%,307956,18529,8347968
8,GOOG,Google Inc,Technology,1/16/2014,1156.22,0.70%,386278,14893,334087


### Give me the list of the companies in the Technology Sector and United States

In [125]:
df.loc[(df['sector'] == 'Technology') & (df['market_capitalization'] > 307956)]

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553
8,GOOG,Google Inc,Technology,1/16/2014,1156.22,0.70%,386278,14893,334087


*Grouping operations**: Split-Apply-Combine operation.

By **gourping** or **group by** operations we are referring to a process involving one or more of the following steps:

- **Splitting** the data into groups based on some criteria
- **Applying** a function to each group independently
- **Combining** the results into a data structure


<img src="img/split_apply_combine.png">

<b>Step1 (Split): </b> The <i>groupby</i> operation <b><i>splits</b></i> the dataframe into a group of dataframes based on some criteria. Note that the grouped object is <i>not</i> a dataframe. It is a GroupBy object. It has a dictionary-like structure and is also iterable.

<b>Step 2 (Analyze):</b> Once we have a grouped object we can <b><i>apply</b></i> functions or run analysis to each group, set of groups, or the entire group.

<b>Step 3 (Combine):</b> We can also <b><i>combine</b></i> the results of the analysis into a new data structure(s).

Since we are only interested in the employees with "Low" and "Very High" JobSatisfaction levels, let's create a new DataFrame containing only those observations.

In [126]:
subset_of_interest = df.loc[(df['sector'] == "Technology") | (df['sector'] == "Energy")]

subset_of_interest.shape

(5, 9)

Since our JobSatisfaction variable had 4 categories, this categories have stayed in the series of this new DataFrame:

In [127]:
subset_of_interest['sector'].value_counts()

Technology    5
Name: sector, dtype: int64

Let's remove those categories we won't be using:

In [128]:
subset_of_interest['sector'].value_counts()

Technology    5
Name: sector, dtype: int64

Now we have only the employees we are interested in, we can now compare accross the variables we wanted. First let's split our new DataFrame into groups.

In [129]:
grouped = subset_of_interest.groupby('sector')

In [130]:
grouped.groups

{'Technology': [1, 2, 4, 6, 8]}

In [132]:
grouped.get_group('Technology').head()

Unnamed: 0,ticker,company_name,sector,trade_date,price,price_change_percent,market_capitalization,annual_sales,shares_outstanding
1,AAPL,Apple Inc,Technology,1/16/2014,554.25,-0.60%,494697,37472,892553
2,IBM,Intl Business Machines,Technology,1/16/2014,188.76,0.50%,204965,23720,1085854
4,SGL.KR,Samsung Electronics,Technology,1/16/2014,1301000.0,0.20%,180329,23444,147299
6,MSFT,Microsoft Corp,Technology,1/16/2014,36.89,0.40%,307956,18529,8347968
8,GOOG,Google Inc,Technology,1/16/2014,1156.22,0.70%,386278,14893,334087


#### Market Capitaliazation

In [158]:
grouped['market_capitalization']

<pandas.core.groupby.generic.SeriesGroupBy object at 0x000002833FBCFE80>

In [159]:
grouped['market_capitalization'].mean()

sector
Technology    314845
Name: market_capitalization, dtype: int32

In [160]:
grouped['market_capitalization'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Technology,5.0,314845.0,130062.525704,180329.0,204965.0,307956.0,386278.0,494697.0


In [161]:
grouped['market_capitalization'].describe().unstack()

       sector    
count  Technology         5.000000
mean   Technology    314845.000000
std    Technology    130062.525704
min    Technology    180329.000000
25%    Technology    204965.000000
50%    Technology    307956.000000
75%    Technology    386278.000000
max    Technology    494697.000000
dtype: float64

#### shares_outstanding

In [162]:
grouped['price_change_percent'].value_counts().unstack()

price_change_percent,-0.60%,0.20%,0.40%,0.50%,0.70%
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Technology,1,1,1,1,1


In [163]:
100 * grouped['price_change_percent'].value_counts(normalize=True).unstack()

price_change_percent,-0.60%,0.20%,0.40%,0.50%,0.70%
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Technology,20.0,20.0,20.0,20.0,20.0


#### Annual Sales

In [164]:
grouped['annual_sales'].describe().unstack()

        sector    
count   Technology         5
unique  Technology         5
top     Technology    14,893
freq    Technology         1
dtype: object