# Descriptive Stastics in Python Exercise - Module 1

In this exercise we will use a dataset related to a collection of individual fundraising campaigns created via the [GoFundMe](https://gofundme.com) website. The data comes from a [project on Github](https://github.com/lmeninato/GoFundMe/) which collected information about GoFundMe projects in 2018.

You will apply your knowledge of descriptive stastics and skills from the data wrangling course to summarize information about specific categories of projects. I've stubbed out a series of steps below. I will describe each task and leave an open code block for you to complete the task. Please use text blocks to summarize your analysis. Use your own knowledge and the [Module 1 example descriptive stats notebook](https://github.com/digitalshawn/STC551/blob/main/Module%201/Descriptive%20Stats%20Example.ipynb) as a guide, but you may use other techniques to answer the prompts.

## What to submit via Canvas

Download a copy of your completed notebook from Google Colab (File --> Download --> Download .ipynb) and upload it to Canvas for this assignment. Please make sure that you run all code blocks so I can see the output when I open the notebook file.

## Help! I have questions!

You may email me with questions or ask to setup a Zoom meeting so we can look at your code together. You may also use the Canvas discussion board to ask questions and share tips. While I ask that you do not collaborate on answers, you may discuss the assignment via Canvas. Keeping any discussions public allows everyone to benefit!

# Let's Get Started!

### Task hints

*   `instructions in this style require you to write and execute python code in a code block`
*   instructions in this style require you to write a summary, analysis, or explanation in a text block




Here we load the modules we will use in this script. They are the same modules that are used in the [example notebook](https://github.com/digitalshawn/STC551/blob/main/Module%201/Descriptive%20Stats%20Example.ipynb).

In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.express as px # accessible module for plotting graphs
from scipy.stats import skew, kurtosis # to analyze the skew of our dataset
import plotly.figure_factory as ff

# Loading the GoFundMe Data

Below we load the GoFundMe data directly via its GitHub URL. Briefly take a look [at the data file](https://raw.githubusercontent.com/lmeninato/GoFundMe/master/data-raw/GFM_data.csv). You'll see that although the files ends in .csv, the fields are delimited (seperated) via a tab and not a comma. You'll see that I've flagged this for panda's read_csv() function using the `sep` argument and setting it equal to a tab (`\t`).



In [3]:
df = pd.read_csv("https://raw.githubusercontent.com/lmeninato/GoFundMe/master/data-raw/GFM_data.csv", sep="\t")


# Let's explore the data file

1.   `show the first few rows of the data file.`
2.   List and describe the meaning of each row






In [4]:
df

Unnamed: 0.1,Unnamed: 0,Url,Category,Position,Title,Location,Amount_Raised,Goal,Number_of_Donators,Length_of_Fundraising,FB_Shares,GFM_hearts,Text,Latitude,Longitude
0,0,https://www.gofundme.com/3ctqm-medical-bills-f...,Medical,0,92 Yr old Man Brutally Attacked.,"LOS ANGELES, CA",327345.0,15000,12167,1 month,26k,12k,Rodolfo Rodriguez needs your help today! 92 Yr...,34.052234,-118.243685
1,1,https://www.gofundme.com/olivia-stoy-bone-marr...,Medical,0,Olivia Stoy:Transplant & Liv it up!,"ASHLEY, IN",316261.0,1.0M,5598,3 months,12k,5.7k,Thomas Stoy needs your help today! Olivia Stoy...,41.527273,-85.065523
2,2,https://www.gofundme.com/autologous-Tcell-Tran...,Medical,1,AUTOLOGOUS T CELL TRANSPLANT,"STATEN ISLAND, NY",241125.0,250000,841,2 months,1.8k,836,Philip Defonte needs your help today! AUTOLOGO...,40.579532,-74.150201
3,3,https://www.gofundme.com/a-chance-of-rebirth,Medical,1,A chance of rebirth,"DUBLIN, CA",237424.0,225000,4708,1 month,9.7k,4.7k,Sriram Kanniah needs your help today! A chance...,37.702152,-121.935792
4,4,https://www.gofundme.com/teamclaire,Medical,1,Claire Wineland Needs Our Help,"GARDEN GROVE, CA",236590.0,225000,8393,2 months,6.4k,8.9k,Melissa Yeager needs your help today! Claire W...,33.774269,-117.937995
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1237,1237,https://www.gofundme.com/help-send-michelle-to...,Wishes,22,Help Send Michelle to Israel!,"DELL RAPIDS, SD",10370.0,8000,103,5 months,398,106,Michelle Serlet needs your help today! Help Se...,43.826084,-96.706161
1238,1238,https://www.gofundme.com/support-alvins-family,Wishes,22,Support Alvinâs Family,"MONROE, GA",10349.0,15000,185,1 month,977,194,Kalvin Ahmed needs your help today! Support Al...,33.794836,-83.713229
1239,1239,https://www.gofundme.com/nuclear-medicine-tech...,Wishes,23,College & Medical expenses,"DALLAS, TX",10330.0,50000,9,27 days,66,11,Anjelica Vossler needs your help today! Colleg...,32.776664,-96.796988
1240,1240,https://www.gofundme.com/girls-junior-national...,Wishes,23,Rhonda's Wish,"San Antonio TX 78218, US",10265.0,15000,109,3 months,476,114,Anesi Maverick Tuufuli needs your help today! ...,29.489578,-98.385532


In [13]:
# Going ahead and slotting this in here for later. 

def value_to_float(x):

    if type(x) == float or type(x) == int:
        return x

    if 'k' in x:
        if len(x) > 1:
            return float(x.replace('k', '')) * 1000
        return 1000.0

    if 'K' in x:
        if len(x) > 1:
            return float(x.replace('K', '')) * 1000
        return 1000.0

    if 'm' in x:
        if len(x) > 1:
            return float(x.replace('m', '')) * 1000000
        return 1000000.0

    if 'M' in x:
        if len(x) > 1:
            return float(x.replace('M', '')) * 1000000
        return 1000000.0

    if 'b' in x:
        return float(x.replace('b', '')) * 1000000000

    if ',' in x:
        return float(x.replace(',', ''))

    return x

In [14]:
df.FB_Shares = df.FB_Shares.apply(value_to_float)
df.GFM_hearts = df.GFM_hearts.apply(value_to_float)
df.Goal = df.Goal.apply(value_to_float)

df.head()

Unnamed: 0.1,Unnamed: 0,Url,Category,Position,Title,Location,Amount_Raised,Goal,Number_of_Donators,Length_of_Fundraising,FB_Shares,GFM_hearts,Text,Latitude,Longitude
0,0,https://www.gofundme.com/3ctqm-medical-bills-f...,Medical,0,92 Yr old Man Brutally Attacked.,"LOS ANGELES, CA",327345.0,15000.0,12167,1 month,26000.0,12000.0,Rodolfo Rodriguez needs your help today! 92 Yr...,34.052234,-118.243685
1,1,https://www.gofundme.com/olivia-stoy-bone-marr...,Medical,0,Olivia Stoy:Transplant & Liv it up!,"ASHLEY, IN",316261.0,1000000.0,5598,3 months,12000.0,5700.0,Thomas Stoy needs your help today! Olivia Stoy...,41.527273,-85.065523
2,2,https://www.gofundme.com/autologous-Tcell-Tran...,Medical,1,AUTOLOGOUS T CELL TRANSPLANT,"STATEN ISLAND, NY",241125.0,250000.0,841,2 months,1800.0,836.0,Philip Defonte needs your help today! AUTOLOGO...,40.579532,-74.150201
3,3,https://www.gofundme.com/a-chance-of-rebirth,Medical,1,A chance of rebirth,"DUBLIN, CA",237424.0,225000.0,4708,1 month,9700.0,4700.0,Sriram Kanniah needs your help today! A chance...,37.702152,-121.935792
4,4,https://www.gofundme.com/teamclaire,Medical,1,Claire Wineland Needs Our Help,"GARDEN GROVE, CA",236590.0,225000.0,8393,2 months,6400.0,8900.0,Melissa Yeager needs your help today! Claire W...,33.774269,-117.937995


*list and description of column headers go here*

# Campaigns by Category



1.   `How many campaigns are in each category?`
2.   `What is the average $ amount raised in each category?`
3.   `What is the average fundraising goal in each category?`
4.   Provide a text summary of the results

*feel free to use multiple code blocks if you'd like*



In [15]:
df.Category.value_counts()

Medical        76
Memorial       72
Volunteer      72
Travel         72
Sports         72
Newlywed       72
Family         72
Faith          72
Event          72
Creative       72
Competition    72
Community      72
Business       72
Education      72
Charity        72
Emergency      72
Wishes         72
Animals        10
11525.0         1
-73.9495823     1
-75.3199035     1
Name: Category, dtype: int64

In [16]:
df_grouped = df.groupby('Category')

In [17]:
for group_name, df_group in df_grouped:
  print("Mean $ Raised:", group_name, df_group["Amount_Raised"].mean())

Mean $ Raised: -73.9495823 nan
Mean $ Raised: -75.3199035 nan
Mean $ Raised: 11525.0 688.0
Mean $ Raised: Animals 98085.4
Mean $ Raised: Business 11813.430555555555
Mean $ Raised: Charity 65931.91666666667
Mean $ Raised: Community 120226.7042253521
Mean $ Raised: Competition 5570.375
Mean $ Raised: Creative 25302.347222222223
Mean $ Raised: Education 45777.86111111111
Mean $ Raised: Emergency 116201.01388888889
Mean $ Raised: Event 10978.422535211268
Mean $ Raised: Faith 12903.785714285714
Mean $ Raised: Family 63499.86111111111
Mean $ Raised: Medical 147340.40789473685
Mean $ Raised: Memorial 115498.94444444444
Mean $ Raised: Newlywed 3478.8169014084506
Mean $ Raised: Sports 19540.125
Mean $ Raised: Travel 7099.871428571429
Mean $ Raised: Volunteer 13642.472222222223
Mean $ Raised: Wishes 23230.583333333332


In [18]:
for group_name, df_group in df_grouped:
  print("Mean Goal:", group_name, df_group["Goal"].mean())

Mean Goal: -73.9495823 nan
Mean Goal: -75.3199035 nan
Mean Goal: 11525.0 141.0
Mean Goal: Animals 98500.0
Mean Goal: Business 36416.208333333336


TypeError: ignored

*summarize output here*

# Looking for outliers in shares and hearts



1.   `Select 3 catgories and create a boxplot of the FB shares and GFM hearts`
2.   `Plot the outliers in the boxplot`
1.   `Calculate the mean, median, mode, std deviation, and variance for the 3 categories' FB shares and GFM hearts`
3.   Summarize these results. What conclusions can you come to about these results?



In [20]:
df_medical = df_grouped.get_group('Medical')

fig = px.box(df_medical, x = "FB_Shares", title = "Distribution of FB Shares")
fig.show()

fig = px.box(df_medical, x = "GFM_hearts", title = "Distribution of GFM Hearts")
fig.show()

print("Mean :", df_medical["FB_Shares"].mean(skipna=True, numeric_only=None))
print("Median:", df_medical["FB_Shares"].median())
print("Mode of:", df_medical["FB_Shares"].mode())
print("Variance: ",df_medical["FB_Shares"].var())
print("Standard Deviation: ", df_medical["FB_Shares"].std())

print("Mean :", df_medical["GFM_hearts"].mean(skipna=True, numeric_only=None))
print("Median:", df_medical["GFM_hearts"].median())
print("Mode of:", df_medical["GFM_hearts"].mode())
print("Variance: ",df_medical["GFM_hearts"].var())
print("Standard Deviation: ", df_medical["GFM_hearts"].std())

TypeError: ignored

In [12]:
print("Variance: ",df_medical["FB_Shares"].var())
print("Standard Deviation: ", df_medical["FB_Shares"].std())

TypeError: ignored

In [21]:
df_emergency = df_grouped.get_group('Emergency')

fig = px.box(df_emergency , x = "FB_Shares", title = "Distribution of FB Shares")
fig.show()

fig = px.box(df_emergency , x = "GFM_hearts", title = "Distribution of GFM Hearts")
fig.show()

print("Mean :", df_emergency ["FB_Shares"].mean(skipna=True, numeric_only=None))
print("Median:", df_emergency["FB_Shares"].median())
print("Mode of:", df_emergency ["FB_Shares"].mode())
print("Variance: ",df_emergency["FB_Shares"].var())
print("Standard Deviation: ", df_emergency["FB_Shares"].std())

print("Mean :", df_emergency ["GFM_hearts"].mean(skipna=True, numeric_only=None))
print("Median:", df_emergency["GFM_hearts"].median())
print("Mode of:", df_emergency ["GFM_hearts"].mode())
print("Variance: ",df_emergency["GFM_hearts"].var())
print("Standard Deviation: ", df_emergency["GFM_hearts"].std())

TypeError: ignored

In [22]:
df_memorial = df_grouped.get_group('Memorial')

fig = px.box(df_memorial , x = "FB_Shares", title = "Distribution of FB Shares")
fig.show()

fig = px.box(df_memorial , x = "GFM_hearts", title = "Distribution of GFM Hearts")
fig.show()

print("Mean :", df_memorial ["FB_Shares"].mean(skipna=True, numeric_only=None))
print("Median:", df_memorial ["FB_Shares"].median())
print("Mode of:", df_memorial ["FB_Shares"].mode())
print("Variance: ",df_memorial["FB_Shares"].var())
print("Standard Deviation: ", df_memorial["FB_Shares"].std())

print("Mean :", df_memorial ["GFM_hearts"].mean(skipna=True, numeric_only=None))
print("Median:", df_memorial ["GFM_hearts"].median())
print("Mode of:", df_memorial ["GFM_hearts"].mode())
print("Variance: ",df_memorial["GFM_hearts"].var())
print("Standard Deviation: ", df_memorial["GFM_hearts"].std())

TypeError: ignored

In [25]:
df_memorial.GFM_hearts.describe()

count       72.0
unique      56.0
top       1000.0
freq         7.0
Name: GFM_hearts, dtype: float64

*summarize your outlier results here*

# Explore on your own

1. Select one category and use descriptive stats to explore the success of campaigns in this category.
1. Use graphs where approporiate.
1. Provide commentary aling the way on what descriptive measures you are using and why.
1. Provide a one to two paragraph summary of the success of this category.

*use as many code and text blocks along the way*
*Also make sure to consult the pandas and plotly documentation along the way*