# Review



In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sympy as sy

##### Problem: Summations

Use the summation formulas below to evaluate the given summations.

$$\sum_{i = 1}^n i = \frac{n^{2}}{2} + \frac{n}{2}
$$

$$\sum_{i = 1}^n i^2 = \frac{n^{3}}{3} + \frac{n^{2}}{2} + \frac{n}{6}
$$

$$\sum_{i = 1}^n i^3 = \frac{n^{4}}{4} + \frac{n^{3}}{2} + \frac{n^{2}}{4}
$$

a. $\sum_{i = 1}^{10} i^2 - i$

b. $\sum_{i = 1}^{20} 4i^3 + i^2 - 6$

c. $\sum_{i = 4}^{10} 2i^2$ 

##### Problem 2: Mean, Variance, Standard Deviation

Below is a sample of historic data relating to NYPD arrests.  We count the 20 most frequent violations, and ask you to use these values to compute the *mean number of incidents*, *variance in incident counts*, and *standard deviation of incident counts*.   

In [24]:
nyc_arrests = pd.read_json('https://data.cityofnewyork.us/resource/8h9b-rp9u.json')

In [25]:
nyc_arrests.head()

Unnamed: 0,arrest_key,arrest_date,pd_cd,pd_desc,ky_cd,ofns_desc,law_code,law_cat_cd,arrest_boro,arrest_precinct,jurisdiction_code,age_group,perp_sex,perp_race,x_coord_cd,y_coord_cd,latitude,longitude
0,173130602,2017-12-31T00:00:00.000,566,"MARIJUANA, POSSESSION",678.0,MISCELLANEOUS PENAL LAW,PL 2210500,V,Q,105,0,25-44,M,BLACK,1063056,207463,40.735772,-73.715638
1,173114463,2017-12-31T00:00:00.000,478,"THEFT OF SERVICES, UNCLASSIFIED",343.0,OTHER OFFENSES RELATED TO THEFT,PL 1651503,M,Q,114,0,25-44,M,ASIAN / PACIFIC ISLANDER,1009113,219613,40.769437,-73.910241
2,173113513,2017-12-31T00:00:00.000,849,"NY STATE LAWS,UNCLASSIFIED VIOLATION",677.0,OTHER STATE LAWS,LOC000000V,V,K,73,1,18-24,M,BLACK,1010719,186857,40.679525,-73.904572
3,173113423,2017-12-31T00:00:00.000,101,ASSAULT 3,344.0,ASSAULT 3 & RELATED OFFENSES,PL 1200001,M,M,18,0,25-44,M,WHITE,987831,217446,40.763523,-73.987074
4,173113421,2017-12-31T00:00:00.000,101,ASSAULT 3,344.0,ASSAULT 3 & RELATED OFFENSES,PL 1200001,M,M,18,0,45-64,M,BLACK,987073,216078,40.759768,-73.989811


In [26]:
nyc_arrests['pd_desc'].value_counts().nlargest(20).to_frame()

Unnamed: 0,pd_desc
ASSAULT 3,142
"LARCENY,PETIT FROM OPEN AREAS,UNCLASSIFIED",89
"TRAFFIC,UNCLASSIFIED MISDEMEAN",74
"MARIJUANA, POSSESSION 4 & 5",67
"INTOXICATED DRIVING,ALCOHOL",53
"ASSAULT 2,1,UNCLASSIFIED",47
"ROBBERY,UNCLASSIFIED,OPEN AREAS",37
"THEFT OF SERVICES, UNCLASSIFIED",32
"LARCENY,GRAND FROM OPEN AREAS,UNCLASSIFIED",29
"CONTROLLED SUBSTANCE, POSSESSION 7",25


##### Problem: Root Mean Squared Error

Suppose we build a model to predict the price of an apartment using the square footage and bedrooms as follows:

$$c(x,y) = 10x + 1.2y + 200$$

where $x$ represents square footage and $y$ the number of bedrooms.  The **Root Mean Squared Error** is defined by:

$$RMSE = \sqrt{\frac{\sum_{i = 1}^n (\hat{y} - y_i)^2 }{n}}$$

essentially the square root of summed squared errors between real and predicted cost.

Use the formula to find the **RMSE** of our models predictions below.

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>square footage</th>      <th>bedrooms</th>      <th>real_cost</th>      <th>predicted_cost</th>    </tr>  </thead>  <tbody>    <tr>      <th>0</th>      <td>400</td>      <td>1</td>      <td>4300.76</td>      <td>4201.2</td>    </tr>    <tr>      <th>1</th>      <td>500</td>      <td>1</td>      <td>5301.01</td>      <td>5201.2</td>    </tr>    <tr>      <th>2</th>      <td>600</td>      <td>3</td>      <td>6302.63</td>      <td>6203.6</td>    </tr>    <tr>      <th>3</th>      <td>700</td>      <td>1</td>      <td>7300.91</td>      <td>7201.2</td>    </tr>    <tr>      <th>4</th>      <td>800</td>      <td>1</td>      <td>8301.07</td>      <td>8201.2</td>    </tr>    <tr>      <th>5</th>      <td>900</td>      <td>3</td>      <td>9303.85</td>      <td>9203.6</td>    </tr>    <tr>      <th>6</th>      <td>1000</td>      <td>2</td>      <td>10302.35</td>      <td>10202.4</td>    </tr>  </tbody></table>

##### Problem: Area under a curve.  

Evaluate the definite integrals below and represent the solution visually as the area under the curve $f(x)$.

a. $\int_{1}^3 x^3 - \frac{1}{x} dx$

b. $\int_{\pi}^{4\pi} 2 \cos(x) dx$

c. $\int_4^9 1.04(5)^x dx$

##### Problem: Area between curves

Given the functions:

$$f(x) = 1 + x + e^{x^2 - 2x} \quad g(x) = x^4 - 6.5x^2 + 6x + 2$$

define the regions R and S shown below.



<center>
    <img src = images/a4p4.png />
    </center>

1. Prove that the lines intersect at $x = 1$.

2. Set up definite integrals to represent the areas $R$ and $S$

3. Evaluate the integrals using technology.

##### Problem: Volumes and Revolution

1. Find the volume of the solid generated by rotating the region bounded by $y = x$, $x = 0$, and $y = (x-1)^2 + 1$.  Sketch an image of this region or try to use Python to visualize.

2. Find the volume of the solid formed by rotating the region R from previous problem about the $x$-axis.  Sketch an image of this region.



##### Problem: Gini Index

The World Bank provides access to data about world GINI Indicies [here](https://data.worldbank.org/indicator/SI.POV.GINI?end=2017&start=1985).  Take a look around at a country of your choice.  What does the GINI Index say about this country?  

The United States Census gathers and provides data related to Income and Poverty in the United States.  Visit their site [here](https://www.census.gov/library/publications/2019/demo/p60-266.html), and explore the data available.  Download one data table and discuss the information your found and what it says about income and poverty in the United States.

In [None]:
# def c(x, y): return 10*x + 1.2*y + 200

# x = np.arange(400, 1100, 100)

# y = np.random.randint(1, 4, 7)

# zhat = np.round(c(x, y), 2)

# z = zhat + np.round(np.random.normal(100, size = 7), 2)

# df = pd.DataFrame({'square footage': x, 'bedrooms': y, 'real_cost': z, 'predicted_cost': zhat})
# df.to_html()

In [None]:
# def f(x): return 1 + x + np.e**(x**2 - 2*x)
# def g(x): return x**4 - 6.5*x**2 + 6*x + 2
# x = np.linspace(0, 2, 1000)
# plt.plot(x, f(x), label = '$f(x)$')
# plt.plot(x, g(x), label = '$g(x)$')
# plt.legend()
# plt.fill_between(x, f(x), g(x), color = 'grey', alpha = 0.1)
# plt.text(0.4, 2.5, 'R', fontsize = 30)
# plt.text(1.4, 2.2, 'S', fontsize = 30)
# plt.savefig('images/a4p4.png')