<a href="https://colab.research.google.com/github/laketalkemp/2025-HUDS-Bootcamp/blob/main/Introduction_to_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <center>  Introduction to pandas </center>
<div>
<img src="https://pandas.pydata.org/static/img/pandas.svg" width="600"/>
</div>

Before, we learned about some useful data structures to store and organize data. These included lists, dictionaries, tuples, and arrays. In this lecture, we will learn about the **pandas** library, some of its features, and new data structures that the library imports.

Pandas is a popular Python library utilized by many data scientists. It offers useful additional functionalities in Python that expand one's capability to store, organize, and analyze data. Additional data structures that comes with the pandas library are ***series*** and ***dataframes***. If you have experience in working with spreadsheets in Microsoft Excel, working with pandas series and dataframes will look familar. Essentially, series and dataframes allow for the storage of data in a tabular format.

For more information on pandas and documentation on functionalities of the pandas library, refer to the <a href="https://pandas.pydata.org/docs/index.html">official pandas webpage</a>. The <a href="https://pandas.pydata.org/docs/reference/index.html">API reference page</a> is an extensive resource for many pandas functions and methods.

# Series

A pandas series can be thought of as a 1D array or a single column in a table.

To make a series, we first must import the pandas library. It is common convention that Pandas is imported as `pd`. Then, we can make a series by using the `pd.Series()` function and utilize a list or array as the input to the function, as shown below:

In [1]:
import pandas as pd

In [2]:
#This is a list of list. So each element in the larger list is a list itself. That individual list holds multiple datatypes (strings and integer(s)).
ancestors_info = [['Malcolm X','Omaha, Nebraska',1925],['Martin Luther King Jr.','Atlanta, Georgia',1929], ['Nana Yaa Asantewaa','Besease, Ghana',1840],['Cater G Woodson','New Canton, VA',1875]]

#Now we can access the pandas module and create a dataframe from the list above. It's important to make sure that the name of columns describe the data correctly.
ancestors_df = pd.DataFrame(ancestors_info,columns=['Name','Place of Birth','Year of Birth'])
ancestors_df.head()

Unnamed: 0,Name,Place of Birth,Year of Birth
0,Malcolm X,"Omaha, Nebraska",1925
1,Martin Luther King Jr.,"Atlanta, Georgia",1929
2,Nana Yaa Asantewaa,"Besease, Ghana",1840
3,Cater G Woodson,"New Canton, VA",1875


In [3]:
groceries = pd.Series(['Doritos', 'Bananas', 'Broccoli', 'Chicken'])

groceries

Unnamed: 0,0
0,Doritos
1,Bananas
2,Broccoli
3,Chicken


Calling `groceries` shows us a series of four items that is indexed from 0 to 3 (inclusive). To name the indices, we can pass another list or array into the `.set_axis()` method.

In [4]:
groceries = groceries.set_axis(['Snack', 'Fruit', 'Vegetable', 'Meat'])

groceries

Unnamed: 0,0
Snack,Doritos
Fruit,Bananas
Vegetable,Broccoli
Meat,Chicken


When creating a series, the index can also be set. To achieve the same outcome as above, we can use the `index` parameter in the `pd.Series()` function:

In [5]:
groceries2 = pd.Series(['Doritos', 'Bananas', 'Broccoli', 'Chicken'],
         index = ['Snack', 'Fruit', 'Vegetable', 'Meat'])

groceries2

Unnamed: 0,0
Snack,Doritos
Fruit,Bananas
Vegetable,Broccoli
Meat,Chicken


Series can be indexed by position (numerically) similar to indexing an array. They can also be indexed by value or name. Below we index a single item and a range of items numerically and by name:

In [7]:
print(groceries2[1])
print(groceries2["Fruit"])

Bananas
Bananas


  print(groceries2[1])


In [6]:
print(groceries2["Fruit":"Meat"])
print('\n')                          # Prints a new line
print(groceries2[1:4])

Fruit         Bananas
Vegetable    Broccoli
Meat          Chicken
dtype: object


Fruit         Bananas
Vegetable    Broccoli
Meat          Chicken
dtype: object


Notice that when slicing by a defined value or name, the end of the slice will be **included**. When slicing by an index position, the end of the slice will be **excluded**.

# Dataframes

Dataframes are essentially tables that consist of multiple series. Below, we see the multiple ways a dataframe can be made.


### Making a dataframe using a dictionary
To construct a dataframe using a dictionary, you can use the `pd.Dataframe()` function. Passing a dictionary into this function creates a dataframe along a column-axis; the keys of the dictionary become the column titles, while the values of each key become the rows of each column. The `index` parameter can also be passed into the function as well but must be defined outside of the dictionary as a list:

In [12]:
grocery_df1 = pd.DataFrame(
    {"Item": ['Doritos', 'Bananas', 'Broccoli', 'Chicken'],
     "Unit Price": [3.99, 0.50, 2.00, 5.00],
     "Quantity": [2, 5, 1, 3]},
    index = ['Snack', 'Fruit', 'Vegetable', 'Meat']
)

print(grocery_df1)
print("\n")
grocery_df1

               Item  Unit Price  Quantity
Snack       Doritos        3.99         2
Fruit       Bananas        0.50         5
Vegetable  Broccoli        2.00         1
Meat        Chicken        5.00         3




Unnamed: 0,Item,Unit Price,Quantity
Snack,Doritos,3.99,2
Fruit,Bananas,0.5,5
Vegetable,Broccoli,2.0,1
Meat,Chicken,5.0,3


### Making a dataframe using lists

Another way to construct a dataframe is by using lists. While passing lists into the `pd.Dataframe()` function, a dataframe is constructed along a row-axis; each list becomes a single row in the dataframe. Using this method, columns can be named by passing a list into the `columns` parameter:

In [13]:
grocery_df2 = pd.DataFrame(
    [['Doritos', 3.99, 2],
     ['Bananas', 0.50, 5],
     ['Broccoli', 2.00, 1],
     ['Chicken', 5.00, 3]],
    index = ['Snack', 'Fruit', 'Vegetable', 'Meat'],
    columns = ["Item","Unit Price","Quantity"]
    )

grocery_df2

Unnamed: 0,Item,Unit Price,Quantity
Snack,Doritos,3.99,2
Fruit,Bananas,0.5,5
Vegetable,Broccoli,2.0,1
Meat,Chicken,5.0,3


### Making a dataframe using pandas series

Finally, multiple series can be used to construct a dataframe by using the `pd.concat()` function. Using this function, one can pass a list of series and specify the method of concatenation/joining through the `axis` parameter. Concatenation while `axis = 0` means that the series will be joined as additional rows; concatenation while `axis = 1` means that the series will be joined together as columns. The `keys` parameter sets the column titles and should be defined using a list.

Once the series have been concatenated into a dataframe, the indexes of the dataframe will start at 0 by default. The `.set_axis()` method can be used on the dataframe to define the index values/titles:

In [14]:
items = pd.Series(['Doritos', 'Bananas', 'Broccoli', 'Chicken'])
unit_price = pd.Series([3.99,0.50,2.00, 5.00])
quantity = pd.Series([2,5,1,3])
indices = pd.Series(['Snack', 'Fruit', 'Vegetable', 'Meat'])

grocery_df3 = pd.concat([items, unit_price, quantity], axis = 1, keys = ['Item', 'Unit Price', 'Quantity'])
grocery_df3 = grocery_df3.set_axis(indices)

grocery_df3

Unnamed: 0,Item,Unit Price,Quantity
Snack,Doritos,3.99,2
Fruit,Bananas,0.5,5
Vegetable,Broccoli,2.0,1
Meat,Chicken,5.0,3


### Loading Data

Python can read several types of files. Below are some useful functions to load data files:

- `pd.read_csv(file)` : Loads comma separated values files (.csv files). Requires `pandas` to be imported first.

- `pd.read_excel(file)` : Loads Microsoft Excel files (.xlsx files). Requires `pandas` to be imported first.

- `open(file, mode)` : Loads text files (.txt files); The `mode` parameter is optional and determines how the file is opened. When `mode` is not specified, the default argument `'r'` is passed, which reads the file.

We will use `pd.read_csv()` to read data stored in comma separated values (.csv) files in this lecture.

When loading data into Python, we can store the csv file as a dataframe for downstream processing and analysis. Below, we load a file called `sample_data.csv` using `pd.read_csv()` and save it as a dataframe called `active_player`:


In Google Colab:
1. upload the file
2. use the file path ('/active_player.csv')

In [15]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [16]:
housing = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Affordable_Rental_Housing_Developments.csv")

housingv2 = housing
housingv2

Unnamed: 0,Community Area Name,Community Area Number,Property Type,Property Name,Address,Zip Code,Phone Number,Management Company,Units,X Coordinate,Y Coordinate,Latitude,Longitude,Location
0,Belmont Cragin,19,Senior,Belmont Place Apts.,4645 W. Belmont Ave.,60641,773-427-1400,Perlmark Realty Management,110.0,1144792.615,1920829.074,41.938767,-87.743263,
1,East Garfield Park,27,Multifamily,East Garfield Park Place,3441 W. Monroe St.,60624,,,25.0,,,,,
2,East Garfield Park,27,Multifamily,Homan Square Apartments Phase I,750 S. Homan Ave.,60624,773-722-7320,Realty & Mortgage Co.,50.0,1153790.485,1896268.991,41.871197,-87.710848,"(41.8711971734122, -87.7108484249133)"
3,Oakland,36,Multifamily,Drexel Street Properties,848 E. 40th St.,60653,312-842-5500,East Lake Management,61.0,1182895.852,1878499.666,41.821808,-87.604546,"(41.82180819274104, -87.60454649932994)"
4,Humboldt Park,23,Multifamily,Nelson Mandela Apts.,3118 W. Franklin Blvd.,60624,773-227-6332,Bickerdike Apts.,6.0,1155403.532,1903205.531,41.890199,-87.704740,"(41.8901994819175, -87.7047397753891)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
506,Woodlawn,42,Multifamily,65th Street Apts.,817 E. 64th St.,60637,773-684-2056,"Urban Property Advisors, LLC",6.0,1182886.226,1862714.794,41.778493,-87.605072,"(41.7784934339328, -87.6050724236176)"
507,West Ridge,2,ARO,"5822 North Western, LLC",5822 N. Western Ave.,60659,773-572-2755,Chicago Apartment Finders,2.0,1159227.225,1938700.509,41.987522,-87.689719,"(41.9875222634019, -87.6897187054826)"
508,Oakland,36,Multifamily,Lakefront Phase II,1120 E. Bowen Ave.,60653,773-548-8795,"Urban Property Advisors, LLC",6.0,1184352.470,1877438.040,41.818861,-87.599236,"(41.8188609892833, -87.5992361922266)"
509,Lower West Side,31,ARO,Woodwork Lofts,1414-48 W. 21st St.,60608,312-576-7392,Stellar Performance Inc.,10.0,1167176.941,1890229.839,41.854348,-87.661875,"(41.8543483215561, -87.6618752555649)"


When calling for `players`, we see a dataframe that contains data on the height of various different house plants sold at a local florist.

To set a column as the index of the dataframe, the `.sex_index()` method can be used. The name of the desired column can be passed as a string into the method:


In [None]:
players = players.set_index('Name')
players

We can obtain a list of the indices by calling `.index` on the dataframe:

In [None]:
players.index

By default, Python shows the first 60 rows of a dataframe. If a dataframe exceeds 60 rows, a truncated version is displayed that shows the first and last five rows.

When working with larger datasets, the number of rows that are shown can be adjusted when the dataframe is used in the `pd.set_option()` function. This function takes two arguments: the option you want to set and the value to which to set the option.

By passing `'display.max_rows'` as the first argument and the number `15` as the second argument, we set the option in our environment to display a truncated version of any dataframe that exceeds 15 rows. A similar approach could be taken with columns by passing `'display.max_columns'` as an argument. This can be useful if you would like to see all the rows and columns of a dataframe or if you want to abbreviate the dataframe after a certain number of rows and columns:

In [17]:
pd.set_option('display.max_rows', 15)
housing

Unnamed: 0,Community Area Name,Community Area Number,Property Type,Property Name,Address,Zip Code,Phone Number,Management Company,Units,X Coordinate,Y Coordinate,Latitude,Longitude,Location
0,Belmont Cragin,19,Senior,Belmont Place Apts.,4645 W. Belmont Ave.,60641,773-427-1400,Perlmark Realty Management,110.0,1144792.615,1920829.074,41.938767,-87.743263,
1,East Garfield Park,27,Multifamily,East Garfield Park Place,3441 W. Monroe St.,60624,,,25.0,,,,,
2,East Garfield Park,27,Multifamily,Homan Square Apartments Phase I,750 S. Homan Ave.,60624,773-722-7320,Realty & Mortgage Co.,50.0,1153790.485,1896268.991,41.871197,-87.710848,"(41.8711971734122, -87.7108484249133)"
3,Oakland,36,Multifamily,Drexel Street Properties,848 E. 40th St.,60653,312-842-5500,East Lake Management,61.0,1182895.852,1878499.666,41.821808,-87.604546,"(41.82180819274104, -87.60454649932994)"
4,Humboldt Park,23,Multifamily,Nelson Mandela Apts.,3118 W. Franklin Blvd.,60624,773-227-6332,Bickerdike Apts.,6.0,1155403.532,1903205.531,41.890199,-87.704740,"(41.8901994819175, -87.7047397753891)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
506,Woodlawn,42,Multifamily,65th Street Apts.,817 E. 64th St.,60637,773-684-2056,"Urban Property Advisors, LLC",6.0,1182886.226,1862714.794,41.778493,-87.605072,"(41.7784934339328, -87.6050724236176)"
507,West Ridge,2,ARO,"5822 North Western, LLC",5822 N. Western Ave.,60659,773-572-2755,Chicago Apartment Finders,2.0,1159227.225,1938700.509,41.987522,-87.689719,"(41.9875222634019, -87.6897187054826)"
508,Oakland,36,Multifamily,Lakefront Phase II,1120 E. Bowen Ave.,60653,773-548-8795,"Urban Property Advisors, LLC",6.0,1184352.470,1877438.040,41.818861,-87.599236,"(41.8188609892833, -87.5992361922266)"
509,Lower West Side,31,ARO,Woodwork Lofts,1414-48 W. 21st St.,60608,312-576-7392,Stellar Performance Inc.,10.0,1167176.941,1890229.839,41.854348,-87.661875,"(41.8543483215561, -87.6618752555649)"


# Exploratory methods and functions for dataframes

When loading in data as dataframes, especially large datasets, you may want to get a quick overview of the data.

Two very useful dataframe methods that can help with this are the `.head()` and `.tail()` methods.  By default, calling `.head()` or `.tail()` on a dataframe will return the first five or the last five rows of the dataframe, respectively. Passing an integer into these methods will give you an output with that number of rows:

In [20]:
housing.head(15)

Unnamed: 0,Community Area Name,Community Area Number,Property Type,Property Name,Address,Zip Code,Phone Number,Management Company,Units,X Coordinate,Y Coordinate,Latitude,Longitude,Location
0,Belmont Cragin,19,Senior,Belmont Place Apts.,4645 W. Belmont Ave.,60641,773-427-1400,Perlmark Realty Management,110.0,1144792.615,1920829.074,41.938767,-87.743263,
1,East Garfield Park,27,Multifamily,East Garfield Park Place,3441 W. Monroe St.,60624,,,25.0,,,,,
2,East Garfield Park,27,Multifamily,Homan Square Apartments Phase I,750 S. Homan Ave.,60624,773-722-7320,Realty & Mortgage Co.,50.0,1153790.485,1896268.991,41.871197,-87.710848,"(41.8711971734122, -87.7108484249133)"
3,Oakland,36,Multifamily,Drexel Street Properties,848 E. 40th St.,60653,312-842-5500,East Lake Management,61.0,1182895.852,1878499.666,41.821808,-87.604546,"(41.82180819274104, -87.60454649932994)"
4,Humboldt Park,23,Multifamily,Nelson Mandela Apts.,3118 W. Franklin Blvd.,60624,773-227-6332,Bickerdike Apts.,6.0,1155403.532,1903205.531,41.890199,-87.70474,"(41.8901994819175, -87.7047397753891)"
5,Grand Boulevard,38,Senior,Bronzeville Senior Apts.,460 E. 41st St.,60653,773-924-2100,Peoples Co-Op Management Service,97.0,1180219.199,1877957.214,41.820382,-87.614382,"(41.8203815434508, -87.6143824353862)"
6,West Town,24,Multifamily,North and Talman Family Apts.,2654 W. North Ave.,60647,773-486-7410,Hispanic Housing Dev. Corp.,30.0,1158184.401,1910570.961,41.910354,-87.694326,"(41.9103544922391, -87.6943256396865)"
7,Norwood Park,10,Senior,Senior Suites of Norwood Park,5700 N. Harlem Ave.,60631,888-339-2315,Senior Lifestyle,84.0,1127328.815,1937441.873,41.984666,-87.807073,"(41.9846662832426, -87.8070730078682)"
8,North Lawndale,29,Multifamily,Dicksons Estates Apts.,1129 S. Sacramento Ave.,60612,773-638-0386,Dickson Estates Apartments,42.0,1156574.009,1894894.51,41.86737,-87.700666,"(41.8673696100651, -87.7006662254259)"
9,Montclare,18,ARO,6602-22 W. Diversey,6602-22 W. Diversey Ave.,60707,872.256.8321,The Brixen,3.0,1131884.372,1917931.507,41.93105,-87.790772,"(41.9310497553307, -87.7907721868365)"


In [21]:
housing.tail()

Unnamed: 0,Community Area Name,Community Area Number,Property Type,Property Name,Address,Zip Code,Phone Number,Management Company,Units,X Coordinate,Y Coordinate,Latitude,Longitude,Location
506,Woodlawn,42,Multifamily,65th Street Apts.,817 E. 64th St.,60637,773-684-2056,"Urban Property Advisors, LLC",6.0,1182886.226,1862714.794,41.778493,-87.605072,"(41.7784934339328, -87.6050724236176)"
507,West Ridge,2,ARO,"5822 North Western, LLC",5822 N. Western Ave.,60659,773-572-2755,Chicago Apartment Finders,2.0,1159227.225,1938700.509,41.987522,-87.689719,"(41.9875222634019, -87.6897187054826)"
508,Oakland,36,Multifamily,Lakefront Phase II,1120 E. Bowen Ave.,60653,773-548-8795,"Urban Property Advisors, LLC",6.0,1184352.47,1877438.04,41.818861,-87.599236,"(41.8188609892833, -87.5992361922266)"
509,Lower West Side,31,ARO,Woodwork Lofts,1414-48 W. 21st St.,60608,312-576-7392,Stellar Performance Inc.,10.0,1167176.941,1890229.839,41.854348,-87.661875,"(41.8543483215561, -87.6618752555649)"
510,Grand Boulevard,38,Multifamily,Cornerstone Apts.,633 E. 50th St.,60615,312-577-5555,The Community Builders Inc.,7.0,1181425.624,1871964.801,41.80391,-87.610142,"(41.8039101011155, -87.6101417642929)"


If a negative integer *n* is passed into the `.head()` method, all but the last *n* rows will be shown.

Likewise, if a negative integer *h* is passed into the `.tail()` method, all but the top *h* rows will be shown:

In [22]:
housing.head(-23)

Unnamed: 0,Community Area Name,Community Area Number,Property Type,Property Name,Address,Zip Code,Phone Number,Management Company,Units,X Coordinate,Y Coordinate,Latitude,Longitude,Location
0,Belmont Cragin,19,Senior,Belmont Place Apts.,4645 W. Belmont Ave.,60641,773-427-1400,Perlmark Realty Management,110.0,1144792.615,1920829.074,41.938767,-87.743263,
1,East Garfield Park,27,Multifamily,East Garfield Park Place,3441 W. Monroe St.,60624,,,25.0,,,,,
2,East Garfield Park,27,Multifamily,Homan Square Apartments Phase I,750 S. Homan Ave.,60624,773-722-7320,Realty & Mortgage Co.,50.0,1153790.485,1896268.991,41.871197,-87.710848,"(41.8711971734122, -87.7108484249133)"
3,Oakland,36,Multifamily,Drexel Street Properties,848 E. 40th St.,60653,312-842-5500,East Lake Management,61.0,1182895.852,1878499.666,41.821808,-87.604546,"(41.82180819274104, -87.60454649932994)"
4,Humboldt Park,23,Multifamily,Nelson Mandela Apts.,3118 W. Franklin Blvd.,60624,773-227-6332,Bickerdike Apts.,6.0,1155403.532,1903205.531,41.890199,-87.704740,"(41.8901994819175, -87.7047397753891)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
483,Logan Square,22,ARO,1962-66 N. Milwaukee Ave. Apts.,1962-66 N. Milwaukee Ave.,60647,773-494-4395,Kiferbaum Devp. Group,4.0,1159903.525,1913009.701,41.917011,-87.687943,"(41.917011260975, -87.6879428188384)"
484,East Garfield Park,27,Supportive Housing,East Park Apts.,3300 W. Maypole Ave.,60624,773-826-3300,The Habitat Co.,152.0,1154287.339,1900828.130,41.883698,-87.708902,"(41.8836980221101, -87.7089025001699)"
485,East Garfield Park,27,Multifamily,Liberty Square Apts.,705-23 S. Independence Ave.,60624,773-538-3800,Bonheur Realty Services Corp.,66.0,1151407.461,1896705.240,41.872441,-87.719586,"(41.8724413775792, -87.7195859944955)"
486,Lake View,6,ARO,3220 Lincoln,3220 N. Lincoln Ave.,60657,312-214-3563,3220 Lincoln LLC,2.0,1164928.833,1921523.945,41.940270,-87.669238,"(41.9402695807229, -87.6692376073269)"


In [23]:
housing.tail(-11)

Unnamed: 0,Community Area Name,Community Area Number,Property Type,Property Name,Address,Zip Code,Phone Number,Management Company,Units,X Coordinate,Y Coordinate,Latitude,Longitude,Location
11,Lower West Side,31,Multifamily,Casa Oaxaca,1714 W. 19th St.,60608,312-666-1323,The Resurrection Project,25.0,1165132.943,1890827.827,41.856033,-87.669361,"(41.8560328733339, -87.6693605654777)"
12,Albany Park,14,Senior,Mayfair Commons,4444 W. Lawrence Ave.,60630,773-205-7862,"Metroplex, Inc.",97.0,1145674.754,1931569.979,41.968224,-87.739747,"(41.9682242320605, -87.7397474865535)"
13,Englewood,68,Supportive Housing,Branch of Hope Apts.,5628-30 S. Halsted St.,60621,773-488-7205,Evergreen Real Estate LLC,99.0,1171893.839,1867410.662,41.791628,-87.645233,"(41.791627817593, -87.6452332086618)"
14,West Town,24,ARO,Westerly Apts.,740 N. Aberdeen St.,60642,312-366-2965,Fifield Cos.,3.0,1168866.667,1905178.379,41.895332,-87.655240,"(41.8953318242, -87.6552398396278)"
15,Oakland,36,Multifamily,Lakefront Phase II,1060 E. 41st St.,60653,773-548-8795,"Urban Property Advisors, LLC",81.0,1184020.764,1877980.711,41.820358,-87.600436,"(41.8203578869792, -87.6004360121251)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
506,Woodlawn,42,Multifamily,65th Street Apts.,817 E. 64th St.,60637,773-684-2056,"Urban Property Advisors, LLC",6.0,1182886.226,1862714.794,41.778493,-87.605072,"(41.7784934339328, -87.6050724236176)"
507,West Ridge,2,ARO,"5822 North Western, LLC",5822 N. Western Ave.,60659,773-572-2755,Chicago Apartment Finders,2.0,1159227.225,1938700.509,41.987522,-87.689719,"(41.9875222634019, -87.6897187054826)"
508,Oakland,36,Multifamily,Lakefront Phase II,1120 E. Bowen Ave.,60653,773-548-8795,"Urban Property Advisors, LLC",6.0,1184352.470,1877438.040,41.818861,-87.599236,"(41.8188609892833, -87.5992361922266)"
509,Lower West Side,31,ARO,Woodwork Lofts,1414-48 W. 21st St.,60608,312-576-7392,Stellar Performance Inc.,10.0,1167176.941,1890229.839,41.854348,-87.661875,"(41.8543483215561, -87.6618752555649)"


When working with large datasets with many variables, the `.columns`, `.shape`, and `.size` attributes and the `.info()` method can be helpful.

The `.columns` attribute returns a list of column names when called on a dataframe:

In [25]:
housing_columns = housing.columns

In [26]:
housing_columns

Index(['Community Area Name', 'Community Area Number', 'Property Type',
       'Property Name', 'Address', 'Zip Code', 'Phone Number',
       'Management Company', 'Units', 'X Coordinate', 'Y Coordinate',
       'Latitude', 'Longitude', 'Location'],
      dtype='object')

The `.shape` attribute returns a tuple of the number of rows and columns within the dataframe:

In [27]:
housing.shape

(511, 14)

The `.size` attribute returns the number of total data points within the dataframe (i.e. the number of rows * the number of columns):

In [28]:
housing.size

7154

Lastly, the `.info()` method provides information on the dataframe, including the range of the indexes, the data type of each column, and the memory usage of the dataframe:

In [29]:
housing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 511 entries, 0 to 510
Data columns (total 14 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Community Area Name    511 non-null    object 
 1   Community Area Number  511 non-null    int64  
 2   Property Type          511 non-null    object 
 3   Property Name          511 non-null    object 
 4   Address                511 non-null    object 
 5   Zip Code               511 non-null    int64  
 6   Phone Number           510 non-null    object 
 7   Management Company     510 non-null    object 
 8   Units                  510 non-null    float64
 9   X Coordinate           510 non-null    float64
 10  Y Coordinate           510 non-null    float64
 11  Latitude               510 non-null    float64
 12  Longitude              510 non-null    float64
 13  Location               508 non-null    object 
dtypes: float64(5), int64(2), object(7)
memory usage: 56.0+ KB


# Activity


1. Load the dataframe from your group into Colab. Make the `"zip code' column the index of the dataframe.

4. Explore the dataset. How many entries are there? How many variables are documented for each entry? What are the data types of each variable?