# Introduction to DataFrames

In the previous part, we covered the most common and fundamental attributes and methods for a Series. This chapter begins our coverage of similar and analogous operations for DataFrames.

## DataFrames vs Series

DataFrames and Series are extremely similar objects. A Series is just a single dimension of data and is usually formed from a single column of a DataFrame. Series have an index and values, but no columns. A DataFrame can be thought of as columns of Series objects each directly accessible by placing the column name within *just the brackets*.

### View the API for a complete list of functionality

As we did for Series, it can be helpful to see the entire list of attributes and methods available to DataFrames. Visit the [DataFrame section][1] of the API to see all of its functionality.

### Best of DataFrame API

DataFrames have an abundance of attributes and methods that do not give any additional functionality to the library. We focus on the core attributes and methods that give you the most power to complete a data analysis.

### Minimally Sufficient Pandas

As a gentle reminder, it is my opinion that you stick with a minimal subset of pandas when doing an analysis. Using more obscure methods does not make you a better analyst. The point of a data analysis is to clearly expose the information held within the data. Just about everything that you want to do can be clearly expressed with minimal pandas syntax.

### Bikes dataset

We will use the bikes dataset to introduce the core attributes and methods of DataFrames.

[1]: http://pandas.pydata.org/pandas-docs/stable/reference/frame.html

In [1]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
0,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,11.0,Michigan Ave & Oak St,15.0,73.9,12.7,mostlycloudy
1,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy


## Core DataFrame attributes

The core DataFrame attributes that you need to know are listed below.

* `index`
* `columns`
* `values`
* `dtypes`
* `shape`
* `size`

### Review the index and columns

We've discussed the index and columns extensively before this chapter. As a review, the purpose of the index is to label each row, just as the the purpose of the columns is to label each column. The index and columns are not data. They are labels for the data. If no index is set during read, pandas uses the default `RangeIndex` which defines a sequence of consecutive integers beginning at 0.

In [2]:
bikes.index

RangeIndex(start=0, stop=50089, step=1)

Let's access the column names with the `columns` attribute.

In [3]:
bikes.columns

Index(['gender', 'starttime', 'stoptime', 'tripduration', 'from_station_name',
       'start_capacity', 'to_station_name', 'end_capacity', 'temperature',
       'wind_speed', 'events'],
      dtype='object')

### `values` returns a 2-D numpy array

The `values` attribute returns a two-dimensional numpy array of all the column values. It is like a DataFrame with no index or columns.

In [4]:
bikes.values

array([['Male', Timestamp('2013-06-28 19:01:00'),
        Timestamp('2013-06-28 19:17:00'), ..., 73.9, 12.7,
        'mostlycloudy'],
       ['Male', Timestamp('2013-06-28 22:53:00'),
        Timestamp('2013-06-28 23:03:00'), ..., 69.1, 6.9, 'partlycloudy'],
       ['Male', Timestamp('2013-06-30 14:43:00'),
        Timestamp('2013-06-30 15:01:00'), ..., 73.0, 16.1,
        'mostlycloudy'],
       ...,
       ['Male', Timestamp('2017-12-30 13:34:00'),
        Timestamp('2017-12-30 13:48:00'), ..., 5.0, 16.1, 'partlycloudy'],
       ['Female', Timestamp('2017-12-31 09:30:00'),
        Timestamp('2017-12-31 09:33:00'), ..., 7.0, 11.5, 'partlycloudy'],
       ['Male', Timestamp('2017-12-31 15:22:00'),
        Timestamp('2017-12-31 15:26:00'), ..., 10.9, 15.0,
        'partlycloudy']], shape=(50089, 11), dtype=object)

### Advanced discussion on the `values` attribute

The `values` attribute always returns a single numpy array, which may lead you to believe that pandas stores its data as a single numpy array. This isn't the case. pandas has separate two-dimensional numpy arrays for each data type in the DataFrame. For instance, if a DataFrame contains integers, floats, strings, and datetimes, then pandas will have four separate numpy arrays to contain data for each of those data types. The reason for this, is that numpy arrays can only be of one specific data type. In other words, numpy arrays contain **homogeneous** data. The exception to this is the 'object' numpy data type which can contain any Python object.

Whenever you access the `values` attribute of a DataFrame, pandas concatenates together all of the numpy arrays together to return a single array. Notice that the data type of the resulting `bikes.values` array is object. This is because the `bikes` DataFrame contains string columns and the only valid numpy data type that allows both numeric and string values is object.

If you had a DataFrame consisting only of integers and floats, then the `values` attribute would return an array with a data type of float. pandas always chooses the data type that loses no information when you access the `values` attribute. Below, an integer column (`tripduration`) and a float column (`start_capacity`) are selected and then the `values` attribute is accessed to return a numpy array with a float data type.

In [7]:
bikes[['tripduration', 'start_capacity']].values.dtype

dtype('float64')

### `shape` returns a tuple of the number of rows and columns

The `shape` attribute returns a Python tuple of length 2 containing the number of rows and columns.

In [8]:
shape = bikes.shape
shape

(50089, 11)

You can get the number of rows or columns as an integer by selecting them from the tuple.

In [9]:
shape[0]

50089

It isn't necessary to assign the tuple to a variable beforehand.

In [10]:
bikes.shape[1]

11

### `size` returns the total number of elements in the DataFrame

The `size` attribute is a bit tricky and returns the total number of elements in the DataFrame. This is simply the number of rows multiplied by the number of columns.

In [11]:
bikes.size

550979

We can verify this by multiplying the number of rows and columns together.

In [12]:
shape[0] * shape[1]

550979

### The `len` function returns the number of rows

Passing in the DataFrame to the built-in Python `len` function returns the number of rows.

In [13]:
len(bikes)

50089

In [14]:
bikes.count()

gender               50089
starttime            50089
stoptime             50089
tripduration         50089
from_station_name    50089
start_capacity       50083
to_station_name      50089
end_capacity         50077
temperature          50089
wind_speed           50089
events               50089
dtype: int64

In [26]:
bikes.dtypes

gender                       object
starttime            datetime64[ns]
stoptime             datetime64[ns]
tripduration                  int64
from_station_name            object
start_capacity              float64
to_station_name              object
end_capacity                float64
temperature                 float64
wind_speed                  float64
events                       object
dtype: object

## Arithmetic DataFrame operations

We now cover the arithmetic operators `+`, `-`, `*`, `/`, `**`, `//`, `%` on a DataFrame. Let's say we have DataFrame, `df`, and execute `df + 5`. pandas attempts to add 5 to each value in the DataFrame. This operation only completes if all the columns are numeric (or boolean).

### Attempt to add 5 to bikes

If we attempt to add 5 to `bikes` we get an error, as there are a mix of numeric, object, and datetime columns. Adding the integer 5 to a string or datetime columns is impossible and a `TypeError` is raised.

In [15]:
bikes

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
0,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,11.0,Michigan Ave & Oak St,15.0,73.9,12.7,mostlycloudy
1,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy
3,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,19.0,Clark St & Randolph St,31.0,72.0,16.1,mostlycloudy
4,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,19.0,Damen Ave & Pierce Ave,19.0,73.0,17.3,partlycloudy
...,...,...,...,...,...,...,...,...,...,...,...
50084,Male,2017-12-30 13:07:00,2017-12-30 13:34:00,1625,State St & Pearson St,27.0,Clark St & Elm St,27.0,5.0,16.1,partlycloudy
50085,Male,2017-12-30 13:34:00,2017-12-30 13:44:00,585,Halsted St & 35th St (*),16.0,Union Ave & Root St,11.0,5.0,16.1,partlycloudy
50086,Male,2017-12-30 13:34:00,2017-12-30 13:48:00,824,Kingsbury St & Kinzie St,31.0,Halsted St & Blackhawk St (*),20.0,5.0,16.1,partlycloudy
50087,Female,2017-12-31 09:30:00,2017-12-31 09:33:00,178,Clinton St & Lake St,23.0,Kingsbury St & Kinzie St,31.0,7.0,11.5,partlycloudy


In [16]:
bikes + 5

TypeError: Addition/subtraction of integers and integer-arrays with DatetimeArray is no longer supported.  Instead of adding/subtracting `n`, use `n * obj.freq`

### Select only numeric data with `select_dtypes`

DataFrames have a method unique to them called `select_dtypes` which selects a subset of columns with the passed type. Pass it the data type you want to select as a string. For example, pass it the string `'int64'` to select all of the 64-bit integer columns.

In [17]:
bikes.select_dtypes('int64').head(3)

Unnamed: 0,tripduration
0,993
1,623
2,1040


Select all of the 64-bit float columns by passing it the string `'float64'`.

In [18]:
bikes.select_dtypes('float64').head(3)

Unnamed: 0,start_capacity,end_capacity,temperature,wind_speed
0,11.0,15.0,73.9,12.7
1,31.0,19.0,69.1,6.9
2,15.0,23.0,73.0,16.1


All columns from each data type may be selected in this manner. You can also select columns of multiple data types by using a list. Here, we select both 64-bit integers and floats.

In [19]:
bikes.select_dtypes(['int64', 'float64']).head(3)

Unnamed: 0,tripduration,start_capacity,end_capacity,temperature,wind_speed
0,993,11.0,15.0,73.9,12.7
1,623,31.0,19.0,69.1,6.9
2,1040,15.0,23.0,73.0,16.1


Integers and floats of different bit sizes are considered different data types and you will need to use their exact name to select them with this method. For instance, 'int16' selects all 16-bit integer columns. A deep dive into all of pandas data types is found in the **Data Types** part of the book.

### Use the string 'number' to select all numeric data

pandas offers a shortcut to select all numeric data types (integers and floats) with the string 'number'. We assign this result to `bikes_number`.

In [27]:
bikes_dt = bikes.select_dtypes('datetime')
bikes_dt.head(3)

Unnamed: 0,starttime,stoptime
0,2013-06-28 19:01:00,2013-06-28 19:17:00
1,2013-06-28 22:53:00,2013-06-28 23:03:00
2,2013-06-30 14:43:00,2013-06-30 15:01:00


In [20]:
bikes_number = bikes.select_dtypes('number')
bikes_number.head(3)

Unnamed: 0,tripduration,start_capacity,end_capacity,temperature,wind_speed
0,993,11.0,15.0,73.9,12.7
1,623,31.0,19.0,69.1,6.9
2,1040,15.0,23.0,73.0,16.1


### Add 5 to `bikes_number`

Now that we have a DataFrame consisting entirely of numeric data, we can successfully add 5 to it.

In [21]:
(bikes_number + 5).head(3)

Unnamed: 0,tripduration,start_capacity,end_capacity,temperature,wind_speed
0,998,16.0,20.0,78.9,17.7
1,628,36.0,24.0,74.1,11.9
2,1045,20.0,28.0,78.0,21.1


### Addition and multiplication with string columns

Addition actually works with strings by appending the word being added to each value. You can also use multiplication to concatenate each string value to itself. We first select all of the string columns and assign the result to a new variable name.

In [22]:
bikes_strings = bikes.select_dtypes('object')
bikes_strings.head(3)

Unnamed: 0,gender,from_station_name,to_station_name,events
0,Male,Lake Shore Dr & Monroe St,Michigan Ave & Oak St,mostlycloudy
1,Male,Clinton St & Washington Blvd,Wells St & Walton St,partlycloudy
2,Male,Sheffield Ave & Kingsbury St,Dearborn St & Monroe St,mostlycloudy


The addition operator can now be used to append a string to each value in the DataFrame.

In [23]:
(bikes_strings + ' SOMESTRING').head(3)

Unnamed: 0,gender,from_station_name,to_station_name,events
0,Male SOMESTRING,Lake Shore Dr & Monroe St SOMESTRING,Michigan Ave & Oak St SOMESTRING,mostlycloudy SOMESTRING
1,Male SOMESTRING,Clinton St & Washington Blvd SOMESTRING,Wells St & Walton St SOMESTRING,partlycloudy SOMESTRING
2,Male SOMESTRING,Sheffield Ave & Kingsbury St SOMESTRING,Dearborn St & Monroe St SOMESTRING,mostlycloudy SOMESTRING


Similarly, the multiplication operator can be used with an integer to concatenate each string value to itself.

In [24]:
(bikes_strings * 3).head(3)

Unnamed: 0,gender,from_station_name,to_station_name,events
0,MaleMaleMale,Lake Shore Dr & Monroe StLake Shore Dr & Monro...,Michigan Ave & Oak StMichigan Ave & Oak StMich...,mostlycloudymostlycloudymostlycloudy
1,MaleMaleMale,Clinton St & Washington BlvdClinton St & Washi...,Wells St & Walton StWells St & Walton StWells ...,partlycloudypartlycloudypartlycloudy
2,MaleMaleMale,Sheffield Ave & Kingsbury StSheffield Ave & Ki...,Dearborn St & Monroe StDearborn St & Monroe St...,mostlycloudymostlycloudymostlycloudy


## DataFrame comparison operators

The comparison operators (`<`, `<=`, `>`, `>=`, `==`, `!=`) work similarly to the arithmetic ones and return a DataFrame of all boolean columns. Here, we test whether every value in the DataFrame is greater than 5.

In [35]:
(bikes_number > 5).head(3)

Unnamed: 0,tripduration,start_capacity,end_capacity,temperature,wind_speed
0,True,True,True,True,True
1,True,True,True,True,True
2,True,True,True,True,True


In [50]:
mask_df = bikes_dt.apply(lambda col: col.dt.year == 2017)

#bikes[mask_df.all(axis=1)]

bikes_dt[mask_df].dropna()

Unnamed: 0,starttime,stoptime
35223,2017-01-01 07:01:08,2017-01-01 07:04:01
35224,2017-01-01 12:50:39,2017-01-01 12:58:30
35225,2017-01-01 16:14:24,2017-01-01 16:21:25
35226,2017-01-01 16:57:44,2017-01-01 17:00:55
35227,2017-01-01 17:14:11,2017-01-01 17:24:29
...,...,...
50084,2017-12-30 13:07:00,2017-12-30 13:34:00
50085,2017-12-30 13:34:00,2017-12-30 13:44:00
50086,2017-12-30 13:34:00,2017-12-30 13:48:00
50087,2017-12-31 09:30:00,2017-12-31 09:33:00


In [48]:
bikes[bikes['starttime'].dt.year.eq(2017) & bikes['stoptime'].dt.year.eq(2017)]

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
35223,Male,2017-01-01 07:01:08,2017-01-01 07:04:01,173,Loomis St & Lexington St,19.0,Ashland Ave & Harrison St,23.0,19.9,5.8,clear
35224,Female,2017-01-01 12:50:39,2017-01-01 12:58:30,471,Michigan Ave & Jackson Blvd,23.0,Indiana Ave & Roosevelt Rd,39.0,37.0,9.2,partlycloudy
35225,Male,2017-01-01 16:14:24,2017-01-01 16:21:25,421,California Ave & Francis Pl,15.0,Kedzie Ave & Milwaukee Ave,27.0,33.1,0.0,mostlycloudy
35226,Male,2017-01-01 16:57:44,2017-01-01 17:00:55,191,Lincoln Ave & Diversey Pkwy,15.0,Racine Ave & Fullerton Ave,19.0,33.1,0.0,mostlycloudy
35227,Male,2017-01-01 17:14:11,2017-01-01 17:24:29,618,Wabash Ave & Grand Ave,15.0,Franklin St & Monroe St,27.0,33.1,0.0,mostlycloudy
...,...,...,...,...,...,...,...,...,...,...,...
50084,Male,2017-12-30 13:07:00,2017-12-30 13:34:00,1625,State St & Pearson St,27.0,Clark St & Elm St,27.0,5.0,16.1,partlycloudy
50085,Male,2017-12-30 13:34:00,2017-12-30 13:44:00,585,Halsted St & 35th St (*),16.0,Union Ave & Root St,11.0,5.0,16.1,partlycloudy
50086,Male,2017-12-30 13:34:00,2017-12-30 13:48:00,824,Kingsbury St & Kinzie St,31.0,Halsted St & Blackhawk St (*),20.0,5.0,16.1,partlycloudy
50087,Female,2017-12-31 09:30:00,2017-12-31 09:33:00,178,Clinton St & Lake St,23.0,Kingsbury St & Kinzie St,31.0,7.0,11.5,partlycloudy


## Overlap of DataFrame and Series methods

Most of the methods that exist for Series also exist for DataFrames and vice-versa. This is good news as it would be a  hassle to have a different set of methods for such similar objects. In this section, we find all the attributes and methods that are either in-common or unique to Series and DataFrames. We begin by using a set comprehension to get all of the public (those that don't begin with an underscore) attributes and methods for each type.

In [28]:
df_public = {method for method in dir(pd.DataFrame) 
             if not method.startswith('_')}
series_public = {method for method in dir(pd.Series) 
                 if not method.startswith('_')}

Let's output the total number of methods for each type. Notice how they have nearly the same amount.

In [29]:
len(df_public), len(series_public)

(209, 210)

Let's find the number of methods in common. The ampersand computes the intersection between sets. About 90% of the attributes and methods are the same.

In [31]:
len(df_public & series_public)

179

In [51]:
print(df_public & series_public)

{'replace', 'nlargest', 'pop', 'idxmin', 'info', 'at_time', 'reindex', 'to_csv', 'loc', 'clip', 'convert_dtypes', 'cummax', 'swaplevel', 'T', 'between_time', 'bfill', 'to_period', 'subtract', 'reset_index', 'swapaxes', 'tz_localize', 'notna', 'div', 'to_string', 'interpolate', 'index', 'mask', 'where', 'kurtosis', 'sort_values', 'tail', 'nunique', 'to_pickle', 'abs', 'to_numpy', 'rmod', 'dropna', 'expanding', 'copy', 'pct_change', 'rfloordiv', 'divide', 'equals', 'combine_first', 'fillna', 'to_clipboard', 'hist', 'any', 'to_markdown', 'transpose', 'ffill', 'prod', 'median', 'squeeze', 'droplevel', 'rolling', 'reindex_like', 'isnull', 'memory_usage', 'idxmax', 'unstack', 'keys', 'drop', 'first', 'rename_axis', 'size', 'mul', 'sort_index', 'to_sql', 'rtruediv', 'axes', 'cummin', 'max', 'le', 'plot', 'sub', 'to_dict', 'empty', 'nsmallest', 'to_excel', 'rsub', 'mode', 'ge', 'ndim', 'aggregate', 'cumsum', 'agg', 'floordiv', 'set_axis', 'apply', 'isna', 'cumprod', 'mean', 'groupby', 'set_fla

### Attributes and methods unique to DataFrames

The minus sign computes the difference between one set and another. It returns all of the elements unique to the first set. Below we return the attributes and methods unique to DataFrames.

In [32]:
print(df_public - series_public)

{'pivot_table', 'stack', 'assign', 'from_dict', 'from_records', 'eval', 'to_gbq', 'to_stata', 'to_orc', 'query', 'iterrows', 'melt', 'to_records', 'insert', 'corrwith', 'to_html', 'select_dtypes', 'merge', 'to_feather', 'join', 'to_parquet', 'pivot', 'style', 'itertuples', 'boxplot', 'columns', 'applymap', 'set_index', 'to_xml', 'isetitem'}


### Attributes and methods unique to Series

Reversing the operation returns the attributes and methods unique to Series.

In [33]:
print(series_public - df_public)

{'to_frame', 'list', 'argsort', 'autocorr', 'dtype', 'view', 'struct', 'name', 'rdivmod', 'argmin', 'is_monotonic_increasing', 'is_monotonic_decreasing', 'item', 'tolist', 'unique', 'to_list', 'divmod', 'nbytes', 'argmax', 'repeat', 'searchsorted', 'str', 'cat', 'case_when', 'is_unique', 'array', 'factorize', 'between', 'ravel', 'hasnans', 'dt'}


## Data Dictionaries

A data dictionary is an important element of a data analysis and at a minimum gives us the column name and description of each column. Other information on each column can be kept in it such as the data type of each column or number of missing values.

The college dataset has a data dictionary available that can be read in as a DataFrame. It contains the descriptions of each column, which is important with this dataset, as the column names are not easily decipherable.

In [52]:
# There are more than 20 (the default) columns
pd.set_option('display.max_columns', 40) 
college = pd.read_csv('../data/college.csv', index_col='instnm')
college.head(3)

Unnamed: 0_level_0,city,stabbr,hbcu,menonly,womenonly,relaffil,satvrmid,satmtmid,distanceonly,ugds,ugds_white,ugds_black,ugds_hisp,ugds_asian,ugds_aian,ugds_nhpi,ugds_2mor,ugds_nra,ugds_unkn,pptug_ef,curroper,pctpell,pctfloan,ug25abv,md_earn_wne_p10,grad_debt_mdn_supp
instnm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0


The data dictionary is a CSV, which can be read in as a DataFrame to help understand the information in the college DataFrame.

In [53]:
pd.read_csv('../data/dictionaries/college_data_dictionary.csv').head()

Unnamed: 0,column_name,description
0,instnm,Institution Name
1,city,City Location
2,stabbr,State Abbreviation
3,hbcu,Historically Black College or University
4,menonly,0/1 Men Only


## Exercises

### Exercise 1

<span  style="color:green; font-size:16px">Select only the 64-bit float columns from the `college` DataFrame. How many are there?</span>

In [65]:
college.dtypes

city                   object
stabbr                 object
hbcu                  float64
menonly               float64
womenonly             float64
relaffil                int64
satvrmid              float64
satmtmid              float64
distanceonly          float64
ugds                  float64
ugds_white            float64
ugds_black            float64
ugds_hisp             float64
ugds_asian            float64
ugds_aian             float64
ugds_nhpi             float64
ugds_2mor             float64
ugds_nra              float64
ugds_unkn             float64
pptug_ef              float64
curroper                int64
pctpell               float64
pctfloan              float64
ug25abv               float64
md_earn_wne_p10        object
grad_debt_mdn_supp     object
dtype: object

In [None]:
college.select_dtypes('float64').shape

(7535, 20)

: 

### Exercise 2
<span  style="color:green; font-size:16px">When you call the `info` method on a DataFrame, one of the very last items that gets outputted is the count of columns for each data type. Can you think of a different combination of pandas operations that would return this as a Series.</span>

In [59]:
college.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7535 entries, Alabama A & M University to Excel Learning Center-San Antonio South
Data columns (total 26 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   city                7535 non-null   object 
 1   stabbr              7535 non-null   object 
 2   hbcu                7164 non-null   float64
 3   menonly             7164 non-null   float64
 4   womenonly           7164 non-null   float64
 5   relaffil            7535 non-null   int64  
 6   satvrmid            1185 non-null   float64
 7   satmtmid            1196 non-null   float64
 8   distanceonly        7164 non-null   float64
 9   ugds                6874 non-null   float64
 10  ugds_white          6874 non-null   float64
 11  ugds_black          6874 non-null   float64
 12  ugds_hisp           6874 non-null   float64
 13  ugds_asian          6874 non-null   float64
 14  ugds_aian           6874 non-null   float64
 15  ug

In [62]:
college.dtypes.value_counts()

float64    20
object      4
int64       2
Name: count, dtype: int64