## Description
Founded in 2000 by a Bronx history teacher, DonorsChoose.org has raised $685 million for America's classrooms. Teachers at three-quarters of all the public schools in the U.S. have come to DonorsChoose.org to request what their students need, making DonorsChoose.org the leading platform for supporting public education.

To date, 3 million people and partners have funded 1.1 million DonorsChoose.org projects. But teachers still spend more than a billion dollars of their own money on classroom materials. To get students what they need to learn, the team at DonorsChoose.org needs to be able to connect donors with the projects that most inspire them.

In the second Kaggle Data Science for Good challenge, DonorsChoose.org, in partnership with Google.org, is inviting the community to help them pair up donors to the classroom requests that will most motivate them to make an additional gift. To support this challenge, DonorsChoose.org has supplied anonymized data on donor giving from the past five years. The winning methods will be implemented in DonorsChoose.org email marketing campaigns.

## Problem Statement
DonorsChoose.org has funded over 1.1 million classroom requests through the support of 3 million donors, the majority of whom were making their first-ever donation to a public school. If DonorsChoose.org can motivate even a fraction of those donors to make another donation, that could have a huge impact on the number of classroom requests fulfilled.

A good solution will enable DonorsChoose.org to build targeted email campaigns recommending specific classroom requests to prior donors. Part of the challenge is to assess the needs of the organization, uncover insights from the data available, and build the right solution for this problem. Submissions will be evaluated on the following criteria:

In [11]:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import date
import calendar
import seaborn as sns
import datetime

donations = pd.read_csv('Donations.csv')
donors = pd.read_csv('Donors.csv',low_memory=False)
schools = pd.read_csv('Schools.csv',error_bad_lines=False)
teachers = pd.read_csv('Teachers.csv',error_bad_lines=False)
projects = pd.read_csv('Projects.csv',error_bad_lines=False)
resources = pd.read_csv('Resources.csv',error_bad_lines=False,warn_bad_lines=False)

## Step 1: Data overview and Cleaning

In [None]:
donations.head()

In [None]:
donations.info()

In [None]:
donations.isnull().sum()

In [None]:
donors.head()

In [None]:
donors.shape

In [None]:
donors.isnull().sum()
## Approx 10% of null in zip and city so dropping this rows.

In [None]:
donors = donors.dropna(how ='any', axis=0)

In [None]:
donors.isnull().sum()

In [None]:
## Lets see how many donors are one time donors
d = donors['Donor ID'].value_counts().to_frame()
len(d[d['Donor ID']>1])

In [None]:
## Donars by city
donors['Donor City'].value_counts().to_frame().head(10)

In [None]:
schools.head()
#schools.shape

In [None]:
schools.isnull().sum()

In [None]:
schools = schools.dropna(how ='any',axis=0)

In [None]:
schools.isnull().sum()

In [None]:
teachers.head()

In [None]:
teachers.isnull().sum()

In [None]:
teachers.shape

In [None]:
teachers = teachers.dropna(how = 'any', axis=0)

In [None]:
projects.head()

In [None]:
projects.isnull().sum()

In [None]:
resources.head()

In [None]:
resources.shape
resources = resources.dropna(how='any',axis=0)

In [None]:
resources.isnull().sum()

## Step 2. Statistical overview of the Data

#### Donation amount

In [None]:
pd.options.display.float_format = "{:.2f}".format
donations["Donation Amount"].describe()


Minimum donation amount is $0.01

Mean donation amount is  $60.67

Maximum donation amount is $60000


In [None]:
print("Total donation amount raised by Donorchoose is  " "${:.2f}".format(float(donations["Donation Amount"].sum())))

#### Top Donor Cities

In [None]:
temp = donors['Donor City'].value_counts().to_frame().head(10)
temp.plot( kind='bar', title = 'Top Donor cities')
plt.xlabel('City name')
plt.ylabel('Count')


### Top Donor States

In [None]:
temp = donors["Donor State"].value_counts().to_frame().head(10)
temp.plot(kind= 'bar', title = 'Top Donar States')
plt.xlabel('State')
plt.ylabel('Count')

##### Donor is Teacher or Non Teacher

In [None]:
df = donors['Donor Is Teacher'].value_counts().reset_index().set_axis(['Teacher/NoTeacher', 'Counts'], axis=1, inplace=False)
plt.pie(df['Counts'],labels = df['Teacher/NoTeacher'],autopct='%1.1f%%',startangle=90)
plt.axis('equal')

#### Statewise Teacher Vs Non Teacher donors

In [None]:
df = donors.groupby(['Donor State','Donor Is Teacher'])['Donor State'].count().reset_index(name="count")
#df.head()
#df.plot(kind='bar',x='Donor State')
#plt.rcParams['figure.figsize']  = [40,20]
#plt.show()

sns.countplot(x="Donor State", hue="Donor Is Teacher", data = df)


#### Yearly Donations trend 

In [None]:
donations['Donation Received Date'] = pd.to_datetime(donations['Donation Received Date'])
donations['year'] = donations['Donation Received Date'].dt.year
temp = donations.groupby('year').agg({'Donor ID' : 'count'})
temp.plot()
plt.xlabel('Year')
plt.ylabel('No of Donors')
plt.title('Trend of Donors from 2012 till date')


#### Donation Received Days

In [None]:
donations['weekday'] = pd.to_datetime(donations['Donation Received Date']).dt.weekday_name
#donations.head()
temp = donations['weekday'].value_counts()
temp.plot(kind='bar')
plt.xlabel('Week days')
plt.ylabel('Count')
plt.title('Donation Received Days')

Number of donations are more in weekdays compared to weekend

#### Top 20 Project categories

In [None]:
temp = projects['Project Subject Category Tree'].value_counts().head(20)
temp.plot(kind='bar')
plt.xlabel('Project Subject Category')
plt.ylabel('Count')
plt.title('Project Subject Category')


#### Top 20 Project Subject Subcategory Tree

In [None]:
temp = projects['Project Subject Subcategory Tree'].value_counts().head(20)
temp.plot(kind='bar')
plt.xlabel('Project Subject Subcategory')
plt.ylabel('Count')
plt.title('Project Subject Subcategory')

In [None]:
# Different type of project Type
projects['Project Type'].unique()

In [None]:
temp = projects.groupby(['Project Type']).agg({'Project ID': 'count'})
temp.plot(kind='bar')
plt.ylabel('Count')
plt.title('Project Type')

In [None]:
temp = schools['School Metro Type'].value_counts()
temp.plot(kind='bar')
plt.xlabel('Metro Type')
plt.ylabel('Count')
plt.title('School Metro Type')

Most of the schools are from suburban

In [None]:
teachers.head()

In [None]:
temp = teachers['Teacher Prefix'].value_counts()
temp.plot(kind='pie',autopct='%1.1f%%',startangle=90)
plt.xlabel('Prefix')
plt.ylabel('Count')
plt.title('Distribution of Teacher prfixes')
plt.axis('equal')

86.4% Teacher who posted the projects are females

In [None]:
teachers['weekday'] = pd.to_datetime(teachers['Teacher First Project Posted Date']).dt.weekday_name
temp = teachers['weekday'].value_counts()
temp.plot(kind='bar')
plt.xlabel('Week days')
plt.ylabel('Count')
plt.title('Project posted Day')

Most of the projects are posted on Sunday and saturdays.

In [None]:
resources['Resource Item Name'].value_counts().to_frame()

Project Status

In [None]:
temp = projects['Project Current Status'].value_counts()
temp.plot(kind='pie',autopct='%1.1f%%',startangle=90)
plt.xlabel('Status')
plt.ylabel('Count')
plt.title('Project current status')
plt.axis('equal')

74.5% projects are fully funded

In [24]:
projects = datetime.datetime.strptime(projects['Project Fully Funded Date'],'%Y-%m-%d')-  datetime.datetime.strptime(projects['Project Posted Date'],'%Y-%m-%d')
print(projects.head())

TypeError: strptime() argument 1 must be str, not Series

Projects take minimum     days to and maximum   days to get fully funded.

In [18]:
projects.head()

Unnamed: 0,Project ID,School ID,Teacher ID,Teacher Project Posted Sequence,Project Type,Project Title,Project Essay,Project Short Description,Project Need Statement,Project Subject Category Tree,Project Subject Subcategory Tree,Project Grade Level Category,Project Resource Category,Project Cost,Project Posted Date,Project Expiration Date,Project Current Status,Project Fully Funded Date
0,7685f0265a19d7b52a470ee4bac883ba,e180c7424cb9c68cb49f141b092a988f,4ee5200e89d9e2998ec8baad8a3c5968,25,Teacher-Led,Stand Up to Bullying: Together We Can!,Did you know that 1-7 students in grades K-12 ...,Did you know that 1-7 students in grades K-12 ...,"My students need 25 copies of ""Bullying in Sch...",Applied Learning,"Character Education, Early Development",Grades PreK-2,Technology,361.8,2013-01-01,2013-05-30,Fully Funded,2013-01-11
1,f9f4af7099061fb4bf44642a03e5c331,08b20f1e2125103ed7aa17e8d76c71d4,cca2d1d277fb4adb50147b49cdc3b156,3,Teacher-Led,Learning in Color!,"Help us have a fun, interactive listening cent...","Help us have a fun, interactive listening cent...","My students need a listening center, read alon...","Applied Learning, Literacy & Language","Early Development, Literacy",Grades PreK-2,Technology,512.85,2013-01-01,2013-05-31,Expired,
2,afd99a01739ad5557b51b1ba0174e832,1287f5128b1f36bf8434e5705a7cc04d,6c5bd0d4f20547a001628aefd71de89e,1,Teacher-Led,Help Second Grade ESL Students Develop Languag...,Visiting or moving to a new place can be very ...,Visiting or moving to a new place can be very ...,My students need beginning vocabulary audio ca...,Literacy & Language,ESL,Grades PreK-2,Supplies,435.92,2013-01-01,2013-05-30,Fully Funded,2013-05-22
3,c614a38bb1a5e68e2ae6ad9d94bb2492,900fec9cd7a3188acbc90586a09584ef,8ed6f8181d092a8f4c008b18d18e54ad,40,Teacher-Led,Help Bilingual Students Strengthen Reading Com...,Students at our school are still working hard ...,Students at our school are still working hard ...,My students need one copy of each book in The ...,Literacy & Language,"ESL, Literacy",Grades 3-5,Books,161.26,2013-01-01,2013-05-31,Fully Funded,2013-02-06
4,ec82a697fab916c0db0cdad746338df9,3b200e7fe3e6dde3c169c02e5fb5ae86,893173d62775f8be7c30bf4220ad0c33,2,Teacher-Led,Help Us Make Each Minute Count!,"""Idle hands"" were something that Issac Watts s...","""Idle hands"" were something that Issac Watts s...","My students need items such as Velcro, two pou...",Special Needs,Special Needs,Grades 3-5,Supplies,264.19,2013-01-01,2013-05-30,Fully Funded,2013-01-01
