# **Python for Beginners: Part 1**

---------------------------------------------------------------------------

This training will introduce you to coding in Python and teach you some fundamental concepts for using Python for data analysis.

Specifically, we'll cover:

* data types and structures
* variables
* functions
* the pandas library
* importing a data set
* conducting exploratory data analysis


We will use Google Colab to execute the code for this training because it doesn't require any installation or setup before getting started. However, when using Python to analyze a data set containing confidential information, it is highly recommended that you do not use Google Colab because there is no guaranteed data privacy.


### **What is Python?**

* Python is a popular programming language in the data science and machine learning communities.
* Python can handle large amounts of data and be used to perform complex analysis.
* It is considered one of the more beginner-friendly programming languages.
* Python is "open source", which means that it's source code is made available for use and modification.
  * Using Python is free!
* It has a large community of developers, which means it's relatively easy to find tutorials and get answers to questions.
* Python's capabilities can be extended even further with it's extensive collection of libraries, which provide pre-written code.

### **Getting Started**

To conduct data analysis with Python, we will use what's called a "notebook." Notebooks contain a mixture of code and text, which make it fairly easy to present your work in an organized way, reproduce your work on other data sets, and share your work with others. If you wanted to start a new notebook, you could do so by clicking File → New Notebook in the bar above.

### **Running Code**

When using Python, we can execute or run our code as soon as we write it, which makes it easy to test out ideas. To execute a code block, click the "play" symbol located on the left side of the code block. The output for the code will be displayed below the block.

In [18]:
1 + 1

2

In [None]:
10 - 2

8

The table below provides the symbols for some common operations.

| Operation | Symbol |
| --------- |: ----- |
| Addition | `+` |
| Subtraction | `-` |
| Multiplication | `*` |
| Division | `/` |
| Exponentiation | `**` |


**Practice**: Calculate $5 \times 10$ in the code block below.

In [None]:
5 * 10

50

**Practice**: Calculate $2^3$ in the code block below.

In [None]:
2**3

8

You can type multiple lines of code in the same code block, but output will only display for the last line of code. If you want to print multiple outputs, you can either use separate code blocks or you can use the built-in `print()` function.

It is common practice to use separate code blocks for separate topics.

In [None]:
#only the output for the last line prints
2**3
5+10

15

In [None]:
#printing both outputs
print(2**3)
print(5+10)

8
15


### **Comments**

Comments are short notes that you can place in code blocks. They are part of the code, but Python ignores them when the code is run. They are intended to provide clarity for people reading the code or prevent execution when testing code.

Comments start with the `#` symbol and everything typed to the right of the `#` symbol will be ignored. You can confirm what is included in the comment because the text will be green.

In [None]:
#this is a comment

In the code below, Python adds `5 + 2` and ignores everything to the right of `#`.



In [None]:
5 + 2 #adding 5 + 2

7

In [None]:
#the code below will run because it's on a new line
5 + 8

13

In [None]:
#we can also create comments that
#are multiple lines

### **Variables**

Variables are essentially containers for storing data. You can also think of them as names that you assign to objects. Variables make it easy to store and reference data in our code.

To create a variable, we use the `=` sign.


In [None]:
x = 3

After you've defined a variable, you can see what it equals by typing the variable name or using the `print()` function.

In [None]:
x

3

In [None]:
print(x)

3


There are several rules when it comes to naming variables:


* A variable name **cannot** contain spaces.
* The first character in the name must be either a letter or an underscore.
  * After the first character, you can use numbers too.
*  Uppercase and lowercase letters are distinct.
  * `a` and `A` are different variables.
* A variable name **cannot** be one of Python's key words, such  as `for`, `and`, `or`, `else`, and `in`.
* It is recommended that you choose names that are somewhat informative and short.

In Colab, clicking on the $\{x\}$ symbol to the left will provide a list of variables currently in use.

It is important to note that you can save over existing variables! We defined `x` as $3$ earlier, but if we run the code below, `x` is now $10$.



In [None]:
x = 10
x

10

**Practice**:

**Practice**:

### **Data Types**

Variables can store different types of data, which have unique purposes. The table below contains some frequently used data types. There are more data types, such as complex numbers, but we won't discuss these.

| Type | Description | Example |
| :--------- |:----- |:----|
| str | text/characters  | `'hello'` |
| int | integer/whole number | `1` |
| float | decimals | `1.2` |
| bool | logical value | `True`, `False` |
| list | ordered collection of data | `['apple', 'banana', 'cherry']` |
| set | unordered collection of data | `{'apple', 'banana', 'cherry'}` |


To check the data type for a variable, you can use the built-in `type()` function. Let's check the type of the variable, `x`, we defined earlier.

In [None]:
type(x)

int

In [None]:
type('hello')

str

In [None]:
type([1, 2, 3])

list

**Practice**: In the code block below, find the data type for `31.8`.

In [None]:
type(31.8)

float

**Practice**: In the code block below, find the data type for `{2, 4, 6}`.

In [None]:
type({2, 4, 6})

set

#### **Booleans**

Booleans (bool) represent one of two values: `True` or `False`.

What would we use the Boolean data type for? In programming, we often want to know if a statement is `True` or `False`.  

In [None]:
10 > 9

True

In [None]:
type(10 > 9)

bool

However, if we wanted to check for equality, we can't use `=` since that's how we assign variables. To check for equality, Python uses `==`. The table below provides the symbols for logical operators.

| Operator | Symbol |
| --------- |: ----- |
| equal to | `==`  |
| greater than | `>` |  
| less than | `<`  |
| greater than or equal to | `>=` |
| less than or equal to | `<=` |

In [None]:
1 == 1

True

**Practice**: In the code block below, create variables `x` and `y` that equal `1` and `1.0001`, respectfully. Then, check to see if `x` is greater than or equal to `y`.

In [None]:
x = 1
y = 1.0001
x >= y

False

#### **Lists and Sets**

Lists and sets look very similar, so what are the differences between the two?

First, lists are **ordered** and sets are not. For example, the set `{1, 2, 3}` is equal to the set `{2, 3, 1}` because they contain the same elements. However, the list `[1, 2, 3]` is not equal to the list `[2, 3, 1]` because even though they contain the same elements, they are not in the same order.

Just as we used logical operators to check if single elements are equal, we can also check if sets or lists are equal.









In [None]:
{1, 2, 3} == {2, 3, 1}

True

In [None]:
[1, 2, 3] == [2, 3, 1]

False

Because lists are ordered, they are also **indexed**. This means we can access elements within a list by referencing their index.

It's important to note here that Python begins indexing with 0. So, the "first" item in a list has index 0, not 1. To access an element by it's index, we put the index within brackets, `[]`, after the list name.

In [None]:
#define a list that contains cities in Colorado
cities = ['Denver', 'Lakewood', 'Centennial', 'Littleton', 'Aurora']

#access the first element in the list
cities[0]

'Denver'

We can also use negative indices, which will start at the end of the list instead of the beginning. This can be a convenient way to see the last item in a list if you're not sure how long the list is.

In [None]:
#access the last element in the list
cities[-1]

'Aurora'

We can access more than just one element at a time by giving a range of indices, such as `[0:2]`. However, in Python, the first number given in the range is inclusive and the last one is not. So, the range `[0:2]`, will provide the elements for indices `[0]` and `[1]`.

In [None]:
#access the first two elements in the list
cities[0:2]

['Denver', 'Lakewood']

If we leave off the starting index for a range, the range will start with the first item in the list by default. Similarily, if we leave off the ending index for a range, the range will end with the last item in the list by default.

In [None]:
#access all elements with an index of three or larger
cities[2:]

['Centennial', 'Littleton', 'Aurora']

Because the items in a set are unordered, they don't have indices, which means we can't refer to them by an index.

Lists and sets are both **mutable** collections of elements. An object is considered mutable if it's data or attributes can be altered after it's created.

We can alter a set by adding or removing items with the `add()` and `remove()` functions. However, the elements in a set are unchangable themselves (aside from removing them).







In [None]:
#define a set of states
states = {'California', 'Washington', 'Oregon', 'Nevada', 'Utah'}

#add 'Arizona' to the list
states.add('Arizona')

#view what elements are in 'states'
states

{'Arizona', 'California', 'Nevada', 'Oregon', 'Utah', 'Washington'}

In [None]:
#remove 'Washington' from the list
states.remove('Washington')

#view what elements are in 'states'
states

{'Arizona', 'California', 'Nevada', 'Oregon', 'Utah'}

We can alter a list by adding elements using the `append()` or `insert()` functions. The `append()` function will add an element to the end of the list and the `insert()` function will insert an element at a specified index.

In [None]:
#add 'Boulder' to the end of the cities list
cities.append('Boulder')

cities

['Denver', 'Lakewood', 'Centennial', 'Littleton', 'Aurora', 'Boulder']

In [None]:
#make 'Golden' the third element in the cities list
cities.insert(2, 'Golden')

cities

['Denver',
 'Lakewood',
 'Golden',
 'Centennial',
 'Littleton',
 'Aurora',
 'Boulder']

We can also alter a list by changing it's elements.

In [None]:
#replace 'Littleton', the third element, with 'Denver'
cities[4] = 'Denver'

cities

['Denver', 'Lakewood', 'Golden', 'Centennial', 'Denver', 'Aurora', 'Boulder']

Notice the `cities` list contains two `'Denver'` elements now. Lists can contain **duplicate items**, whereas sets cannot. If we try to add a duplicate item to a set, it gets removed, but if we do the same in a list, it stays.

In [None]:
{1, 2, 3, 2}

{1, 2, 3}

In [None]:
[1, 2, 3, 2]

[1, 2, 3, 2]

Lastly, if we want to check the length of a set or list, we can use the built-in `len()` function.

In [None]:
#check the length of the states set
len(states)

5

In [None]:
#check the length of the cities list
len(cities)

7

Although lists and sets often contain items of the same data type, there is no requirement that the items have the same data type. For example, the list below contains the string `'Denver'`, Denver's population, and the number of neighborhoods.

In [None]:
#define a list with multiple data types
denver = ['Denver', 2931000, 78]

#view the list
denver

['Denver', 2931000, 78]

The table below summarizes the similarities and differences between lists and sets.

| | Lists | Sets |
| --------- |: ----- |: ----- |
| Mutable | $\checkmark$ | $\checkmark$ |
| Ordered | $\checkmark$ |  |
| Allow for duplicate items | $\checkmark$  | |
| Indexed | $\checkmark$ | |
| Allow for different data types | $\checkmark$ | $\checkmark$ |

**Practice**: Create a list...

**Practice**: Create a set...

### **If Statements and Loops?**

### **Functions**

We've already used a few functions, like `print()` or `type()`. In general, functions take inputs called arguments and usually produce something in return. The Python documentation provides a complete [list of built-in functions](https://docs.python.org/3/library/functions.html), and the table below includes some of the commonly used functions.

| Function | Description |
| --------- |: ----- |
| `print()` | prints the output |
| `sorted()` | sorts a list in ascending order |
| `type()` | returns the data type for an object |
| `abs()` | returns the absolute value of a number |
| `round()` | returns the number rounded to a specified number of decimal places |
| `sum()` | returns the sum of items in an object  |
| `len()` | returns the length of an object |
| `max()` | returns the largest item in an object |
| `min()` | returns the smallest item in an object |   

In [None]:
#define a list from 1 to 10
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
#add the elements in x together
sum(x)

55

In [None]:
#find the number of elements in x
len(x)

10

In [None]:
#find the smallest element in x
min(x)

1

We can also create our own custom functions. Functions can be extremely helpful if we've written code that will be used multiple times in the same script. Creating a function is a cleaner approach than just copying and pasting code.


Declaring a function consists of five parts:
1. The `def` keyword
2. The function name
  * The requirements for a function name are extremely similar to the variable name requirements. It's also recommended to avoid using the same name as an existing function, even though this is technically allowed.
3. Arguments
  * These are the values that are fed to the function.
4. The body of the function
5. A return statement
  * This is used to define the "output" of the function.
  * If you need to return more than one value from a  function, you can use a list.


Let's start with something simple and define a function that adds two numbers. Python has the addition operator, `+`, so this is just to demonstrate the format for defining a function.

In [None]:
def addition(num1, num2): #def keyword, function name, and arguments
  sum = num1 + num2       #body of the function
  return sum              #return statement

In [None]:
addition(1, 2)

3

**Practice**: Create a function named `summary` that accepts a list and returns the minimum, maximum, and length of the list. Check that your function works as expected by trying it out on the list, `y`, defined in the code block below. *Hint: You'll need to return a list in order to return the minimum, maximum, and length.*

In [None]:
#list to check function
y = [2, 3, 4, 5, 6, 7, 8, 9]

In [None]:
def summary(list):
  minimum = min(list)
  maximum = max(list)
  length = len(list)
  return [minimum, maximum, length]

In [None]:
summary(y)

[2, 9, 8]

### **Errors**

There are three types of errors that can occur when coding in Python: syntax errors, runtime errors, and logical errors.

Syntax refers to the rules that define the structure of a programming language, so **syntax errors** occur when the proper syntax is not followed. Some examples of syntax errors are leaving out a comma or a bracket.





In [None]:
print('hello'

SyntaxError: incomplete input (<ipython-input-1-0b37b907169d>, line 1)

**Runtime errors** occur when the syntax is correct, but the program can't run for a different reason, like dividing by zero or trying to access an object that doesn't exist.

In [None]:
#the name of the set is 'states' not state
state

NameError: name 'state' is not defined

**Logical errors** are the most difficult to fix because there are no error messages. The code runs without any issues, but the result is incorrect due to flawed logic.

In [None]:
#define a list from 1 to 10
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

#find the sum of the first 5 elements (it should be 15)
sum(x[0:4])

10

The code above returned a value of 10 instead of the expected 15. This is because the last value in the range is exclusive.

Logical errors can often be fixed by double checking the results of intermediate steps in the code.

In [None]:
x[0:4] #only the first 4 elements, not 5

[1, 2, 3, 4]

### **Libraries**

Python has a lot of capabilities on its own, but we can do even more with the use of libraries. Libraries are collections of code and functions that extend the capabilities of Python. Some of the most popular libraries are pandas, NumPy, Plotly, and Matplotlib.

### **The pandas Library**

We'll start with the pandas library, which contains a variety of tools for doing things like:
* manipulating data
* reading and writing data
* slicing, indexing, and subsetting data
* aggregating or transforming data
* merging and joining of data sets
* time series analysis

The pandas library also has a [user guide](https://pandas.pydata.org/docs/user_guide/index.html) that covers a wide variety of topics and how pandas can be used to approach data analysis problems.

To use a library in Python, we use an `import` statement. It is recommended to import all libraries at the beginning of the script or notebook. Although this isn't necessary, it's common to import libraries under an alias or an alternate shorter name. For example, the pandas library is often imported with an alias of `pd`. This means the library can now be referred to as `pd` instead of `pandas`.


In [None]:
import pandas as pd

#### **Importing Data**

Let's import a dataset and perform some exploratory data analysis using the pandas library. Often we'll have datasets stored as csv files and pandas makes it easy to import a csv file with the `pd.read_csv()` function. The `pd.read_csv()` function takes a file path as the argument. This can be the path to a file stored on your computer or even a URL.

We'll use the American Community Survey data for the Denver neighborhoods to test out some of the capabilities of pandas. Each row in the data set represents a neighborhood and each column contains demographic information for the neighborhoods.

*Use the P-Card data instead.*

In [None]:
#define a variable, 'path', that stores the URL for the data
#this step is optional and depends on style preference
path = 'https://github.com/kayley-smiley/Python-Training/blob/main/american_community_survey_nbrhd_2017_2021.csv?raw=true'

#read in the data and name it 'df'
df = pd.read_csv(path)

pandas has two widely used data structures: `Series` and `DataFrame`. You can think of a `Series` as a single column of data and a `DataFrame` as a dataset that contains many rows and columns. Let's check the structure of the dataset we just imported, `df`, using the `type()` function.

In [None]:
type(df)

`df` is a `DataFrame` because the dataset contains many rows and columns (or `Series`).

**Practice**: In the code block below, check the structure of one of the columns in `df`.

In [None]:
type(df['AGE65PLUS'])

As expected, it is a `Series`.

#### **Data Frames**
 If you've just imported a data set and want to take a look at the first or last few rows, we can use the `DataFrame.head()` and `DataFrame.tail()` functions. To use either function, replace the word `DataFrame` with the name of a `DataFrame` object. In our case, this would be `df`. Both of these functions have an optional argument, `n`, to specify the number of rows we want returned. If you don't specify `n`, the function will default to 5 rows.

In [None]:
df.head()

Unnamed: 0,Shape,NBHD_NAME,TTL_POPULATION_ALL,HISPANIC_OR_LATINO,WHITE,BLACK,NATIVE_AMERICAN,ASIAN,HAWAIIAN_PI,OTHER_RACE,TWO_OR_MORE,PCT_HISPANIC,PCT_WHITE,PCT_BLACK,PCT_NATIVEAM,PCT_ASIAN,PCT_HAWAIIANPI,PCT_OTHERRACE,PCT_TWOORMORE_RACES,MALE,FEMALE,AGE_LESS_5,AGE_5_TO_9,AGE_10_TO_14,AGE_15_TO_17,AGE_0_TO_9,AGE_10_TO_19,AGE_20_TO_29,AGE_30_TO_39,AGE_40_TO_49,AGE_50_TO_59,AGE_60_TO_69,AGE_70_TO_79,AGE_80_PLUS,AGELESS18,AGE65PLUS,PCT_AGELESS18,PCT_AGE65PLUS,MEDIAN_AGE_ALL,MEDIAN_AGE_MALE,MEDIAN_AGE_FEMALE,TTL_AGE_3_PLUS_ENRSTATUS,ENROLLED_IN_SCHOOL,NURSERY_OR_PRESCHOOL,KINDERGARTEN,GRADES_1_TO_4,GRADES_5_TO_8,GRADES_9_TO_12,COLLEGE_UNDERGRADUATE,GRADUATE_SCHOOL,NOT_ENROLLED,TOTAL_COMMUTERS,COMMUTE_LESS_15,COMMUTE_15_TO_30,COMMUTE_30_TO_45,COMMUTE_45_TO_60,COMMUTE_60_TO_PLUS,TTLPOP_25PLUS_EDU,LESS_THAN_HS_DIPLOMA_EDU,HSGRAD_OR_EQUIV_EDU,SOMECOLLEGE_OR_AA_EDU,BACHELORS_OR_HIGHER_EDU,TTLPOP_5PLUS_LNG,ONLY_ENGLISH_LNG,SPANISH_LNG,TTL_HOUSING_UNITS,OCCUPIED_HU,VACANT_HU,OWNER_OCCUPIED_HU,RENTER_OCCUPIED_HU,TTL_HOUSEHOLDS,FAMILY_HOUSEHOLDS,MARRIED_COUPLE_FAMILY,OTHER_FAMILY,MALE_HHLDR_NO_WIFE_PRESENT,FEMALE_HHLDR_NO_HSBND_PRESENT,NONFAMILY_HOUSEHOLD,HOUSEHOLDER_ALONE,HOUSEHOLDER_NOT_ALONE,HH_INC_LESS_10000,HH_INC_10000_14999,HH_INC_15000_19999,HH_INC_20000_24999,HH_INC_25000_29999,HH_INC_30000_34999,HH_INC_35000_39999,HH_INC_40000_44999,HH_INC_45000_49999,HH_INC_50000_59999,HH_INC_60000_74999,HH_INC_75000_99999,HH_INC_100000_124999,HH_INC_125000_149999,HH_INC_150000_199999,HH_INC_OVER_200000,MED_HH_INCOME,MED_FAMILY_INCOME,PER_CAPITA_INCOME,AVG_HH_INCOME,AVG_FAM_INCOME,MEDIAN_EARNINGS,MEDIAN_EARN_MALE,MEDIAN_EARN_FEMALE,MEDEARN_LESSHS,MEDEARN_HIGHSCHOOL,MEDEARN_SOMECOLLEGE,MEDEARN_BACHELORS,MEDEARN_GRAD_PROFESSIONAL,MED_YR_STRUCTURE_BUILT,BUILT_2014_OR_LATER,BUILT_2010_2013,BUILT_2000_2009,BUILT_1990_1999,BUILT_1980_1989,BUILT_1970_1979,BUILT_1960_1969,BUILT_1950_1959,BUILT_1940_1949,BUILT_1939_OR_EARLIER,MED_CONTRACT_RENT,MED_GROSS_RENT,MEDIAN_HOME_VALUE,NATIVE,FOREIGN_BORN_FB,EUROPEAN_FB,NORTHERN_EUROPE_FB,WESTERN_EUROPE_FB,SOUTHERN_EUROPE_FB,EASTERN_EUROPE_FB,ASIA_FB,EASTERN_ASIA_FB,SOUTH_CENTRAL_ASIA_FB,SOUTH_EASTERN_ASIA_FB,WESTERN_ASIA_FB,AFRICA_FB,EASTERN_AFRICA_FB,MIDDLE_AFRICA_FB,SOUTHERN_AFRICA_FB,WESTERN_AFRICA_FB,OCEANIA_FB,AMERICAS_FB,LATIN_AMERICA_FB,CARRIBEAN_FB,CENTRAL_AMERICA_FB,SOUTH_AMERICA_FB,NORTH_AMERICA_FB,PCT_POVERTY,PCT_FAM_POVERTY
0,<geoprocessing describe geometry object object...,Washington Virginia Vale,14775.0,2316.0,8098.0,2642.0,13.0,665.0,0.0,180.0,861.0,15.675127,54.808799,17.881557,0.087986,4.500846,0.0,1.218274,5.827411,7643.0,7132.0,814.0,894.0,790.0,401.0,1708.0,1498.0,2909.0,2681.0,2243.0,1317.0,993.0,871.0,555.0,2899.0,2738.0,19.620981,18.531303,34.271727,32.868953,35.770833,14336,3199,223,227,741,476,636,651,245,11137,6853.0,1546.0,2538.0,2076.0,356.0,337.0,10710.0,541.0,1814.0,2659.0,5696.0,13961.0,10848.0,938.0,7353,6737,616,2878,3859,6737,3048,1914,1134,369,765,3689,2601,1088,383,247,187,197,400,201,326,634,355,698,674,654,504,376,422,479,56282,68623,38425,85537,101677,38374,40554,33180,0,21736,37190,38118,59192,1971.950244,0,33,200,661,854,2462,1884,1155,79,25,948,965,437046,12448,2327,542,36,0,55,451,641,68,334,118,121,683,106,41,0,129,22,439,396,13,318,65,43,13.175,7.084978
1,<geoprocessing describe geometry object object...,Washington Park West,7382.0,555.0,6167.0,59.0,0.0,202.0,0.0,0.0,399.0,7.518288,83.541046,0.799241,0.0,2.736386,0.0,0.0,5.405039,4028.0,3354.0,398.0,222.0,217.0,72.0,620.0,329.0,1677.0,2058.0,924.0,847.0,506.0,355.0,66.0,909.0,905.0,12.313736,12.25955,34.657434,33.761321,35.609218,7141,1028,132,59,117,205,94,217,204,6113,3705.0,688.0,1711.0,1031.0,150.0,125.0,5974.0,91.0,394.0,768.0,4721.0,6984.0,6410.0,278.0,4121,3887,234,1878,2009,3887,1417,1196,221,60,161,2470,1645,825,131,122,20,72,92,137,161,138,58,256,332,463,328,303,473,801,97920,154719,70630,134188,188029,62336,63980,60213,6458,0,20491,69479,88584,1939.0,0,455,185,50,56,128,125,413,192,2517,1036,1081,680135,6969,413,76,41,0,35,0,177,49,69,51,8,14,14,0,0,0,23,123,123,0,123,0,0,6.35,3.541058
2,<geoprocessing describe geometry object object...,Sun Valley,1133.0,505.0,88.0,457.0,35.0,46.0,0.0,0.0,2.0,44.571933,7.76699,40.335393,3.089144,4.060018,0.0,0.0,0.176523,553.0,580.0,155.0,168.0,167.0,123.0,323.0,323.0,54.0,299.0,29.0,40.0,34.0,15.0,16.0,613.0,52.0,54.104148,4.589585,16.8,12.8,30.8,1058,546,67,16,154,128,141,40,0,512,276.0,72.0,114.0,40.0,0.0,50.0,474.0,161.0,196.0,68.0,49.0,978.0,709.0,92.0,422,383,39,36,347,383,325,75,250,7,243,58,58,0,124,21,50,101,35,29,0,0,0,0,0,0,23,0,0,0,19650,20292,7970,22163,23720,16350,27552,11750,18015,16450,2499,0,0,1952.0,0,0,0,0,7,61,70,95,110,79,193,239,535700,996,137,1,0,0,1,0,27,0,0,27,0,91,19,55,0,17,0,18,18,0,18,0,0,77.7,75.384615
3,<geoprocessing describe geometry object object...,Cory - Merrill,4215.0,213.0,3572.0,15.0,3.0,240.0,0.0,6.0,166.0,5.053381,84.744958,0.355872,0.071174,5.69395,0.0,0.142349,3.938316,2215.0,2000.0,286.0,292.0,229.0,154.0,578.0,477.0,269.0,881.0,598.0,421.0,568.0,334.0,89.0,961.0,821.0,22.799526,19.478055,39.0,38.7,39.6,4038,1036,130,67,225,171,194,167,82,3002,1528.0,538.0,588.0,265.0,62.0,75.0,3030.0,21.0,202.0,437.0,2370.0,3929.0,3539.0,114.0,1783,1704,79,1403,301,1704,1132,1003,129,50,79,572,412,160,43,13,0,32,0,56,31,8,10,65,108,94,220,169,323,532,150441,182174,82192,201440,239710,95000,115387,73007,0,0,60769,109363,79659,1954.0,0,139,317,38,105,16,61,362,469,276,1942,1995,723700,3927,288,124,45,30,9,40,137,40,73,24,0,11,0,0,11,0,0,16,16,0,0,16,0,5.2,1.413428
4,<geoprocessing describe geometry object object...,Rosedale,2713.0,502.0,1911.0,142.0,11.0,77.0,0.0,0.0,70.0,18.503502,70.438629,5.234058,0.405455,2.838187,0.0,0.0,2.58017,1274.0,1439.0,183.0,68.0,91.0,19.0,251.0,110.0,689.0,573.0,364.0,210.0,254.0,123.0,139.0,361.0,422.0,13.306303,15.554736,34.3,36.0,33.2,2664,357,42,36,41,82,19,43,94,2307,1371.0,219.0,723.0,260.0,142.0,27.0,2246.0,82.0,188.0,540.0,1436.0,2530.0,2207.0,210.0,1489,1400,89,670,730,1400,489,425,64,57,7,911,624,287,52,0,16,63,33,46,113,20,21,124,112,151,136,64,299,150,89400,128393,54712,109105,146813,65087,71774,57036,0,0,36250,68988,80385,1952.0,0,87,10,17,112,95,195,300,305,368,1633,1671,593000,2534,179,37,6,11,0,20,40,0,0,24,16,0,0,0,0,0,0,102,102,24,62,16,0,7.5,0.0


In the output above, we can only see some of the columns. If we want to change this, we can use the `pd.set_option()` function with the `'display.max_columns'` argument  to change the default number of columns displayed. Similarly, we can change the default number of rows displayed with the `'display.max_rows'` argument.

In [None]:
#change the number of columns displayed to 150
pd.set_option('display.max_columns', 150)

Now, try re-running the same code.

In [None]:
df.head()

**Practice**: Using the code block below, look at the last 8 rows of `df`.

In [None]:
df.tail(n=8)

Unnamed: 0,Shape,NBHD_NAME,TTL_POPULATION_ALL,HISPANIC_OR_LATINO,WHITE,BLACK,NATIVE_AMERICAN,ASIAN,HAWAIIAN_PI,OTHER_RACE,...,WESTERN_AFRICA_FB,OCEANIA_FB,AMERICAS_FB,LATIN_AMERICA_FB,CARRIBEAN_FB,CENTRAL_AMERICA_FB,SOUTH_AMERICA_FB,NORTH_AMERICA_FB,PCT_POVERTY,PCT_FAM_POVERTY
75,<geoprocessing describe geometry object object...,Westwood,17771.0,14627.0,2240.0,497.0,145.0,179.0,0.0,7.0,...,0,0,5398,5385,74,5291,20,13,30.025,28.412088
76,<geoprocessing describe geometry object object...,Villa Park,8859.0,6065.0,2346.0,244.0,61.0,43.0,0.0,0.0,...,0,0,1605,1605,35,1570,0,0,27.05,23.24218
77,<geoprocessing describe geometry object object...,Hampden South,16718.0,2697.0,11851.0,1336.0,57.0,320.0,0.0,15.0,...,0,0,840,790,17,709,64,50,5.925,3.428282


The table below contains some helpful functions from the pandas library.


| Function Name | Use |
| --------- |: ----- |
| `DataFrame.info()` | provides a concise summary of a `DataFrame` |
| `DataFrame.dtypes()` | provides the data types for each column in a `DataFrame` |
| `DataFrame.shape` | provides the number of rows and columns in a `DataFrame` |
| `DataFrame.describe()` | provides descriptive statistics for each column in a `DataFrame` |
| `DataFrame.sample()` | randomly selects a sample from a `Series` or `DataFrame` |
| `DataFrame.isna()` | returns a `DataFrame` filled with boolean values indicating missing values |
| `DataFrame.dropna()` | removes missing values |
| `DataFrame.sample()` | randomly selects a sample from a `Series` or `DataFrame` |
| `DataFrame.sort_values()` | sorts the values in a `DataFrame` in ascending or descending order based on one or more columns |
| `DataFrame.nunique()` | returns a `Series` with the number of distinct elements in each column |
| `DataFrame.unique()` | returns unique values in order of appearance |
| `DataFrame.value_counts()` | returns a `Series` containing the counts of unique values |
| `DataFrame.groupby()` | groups a `DataFrame` by values in one or more columns |
| `pd.to_datetime()` | converts to a datetime object |
| `pd.to_numeric()` | converts to a numeric object |
| `DataFrame.duplicated()` | returns a boolean `Series` denoting duplicate rows |
| `DataFrame.drop_duplicates()` | returns a `DataFrame` with duplicate rows removed |
| `DataFrame.merge()` | merges two `DataFrame` objects |




### **Practice**

*Assign some EDA tasks using functions described above and concepts from the first half.*

### **Sources**

* Starting out with Python, 3rd edition by Tony Gaddis
* W3 Schools Python Tutorial
* pandas User Guide
* Geeks for Geeks Website

