# Data Science For Good: DonorsChoose.org
## Recommender system (Part 2)

This notebook follows up from the EDA and Data Cleaning (Part 1). The aim of this notebook is to create a recommendation system that enables DonorsChoose.org to build targeted email campaigns recommending specific classroom requests to prior donors. For example, if a donor has donated to a classroom project that aims to improve children literacy, they may want to donate to other similar types of projects.

The solution I propose for this problem is a content-based recommender system. The system utilizes users' past donations to propose similar classroom projects based on their attributes.

# System Requirements and Python Libraries Used

Date: 30/07/2018

Version: 1.0

Computer Requirements: 16GB Ram

Environment: Python 3.6 and Jupyter notebook

In [1]:
import pandas as pd
import numpy as np
from scipy.sparse import coo_matrix, vstack
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import normalize
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# 1. Loading the Data

- Donations.CSV
- Projects.CSV

In [2]:
donations_df = pd.read_csv("Adj_Donations.csv")
projects_df = pd.read_csv("Adj_Projects.csv")

# 2. Pre-processing
Merge both datasets by Project ID. 

In [3]:
combine_data = pd.merge(donations_df,projects_df, on='Project ID', how = "inner").drop(['index'],axis = 1)

Since the dataset is a time series of donations made by users, preserving the time series is important as the system will record and adjust the recommendations made to the donors over time.

Therefore, we sort by donation date in ascending order from earliest to latest.  

In [4]:
combine_data['Donation Received Date'] = pd.to_datetime(combine_data['Donation Received Date'])

In [5]:
combine_data = combine_data.sort_values(by='Donation Received Date')

For testing purposes, we have limit the dataset to just the first 50,000 rows. 

In [6]:
combine_df = combine_data.head(50000).reset_index(drop=True)

We create a project dataframe that will be used to create new text features as an attribute that characterizes each classroom project.   

The project dataset is filtered by projects that only show up in the first 50,000 rows of donation.

In [7]:
projects_used = combine_df['Project ID'].unique()

In [8]:
project_df = projects_df[projects_df['Project ID'].isin(projects_used)].copy()

In [9]:
project_df = project_df.reset_index(drop=True)

In [10]:
project_df.head()

Unnamed: 0,index,Project ID,School ID,Teacher ID,Teacher Project Posted Sequence,Project Type,Project Title,Project Essay,Project Short Description,Project Need Statement,Project Subject Category Tree,Project Subject Subcategory Tree,Project Grade Level Category,Project Resource Category,Project Cost,Project Posted Date,Project Expiration Date,Project Current Status,Project Fully Funded Date
0,0,7685f0265a19d7b52a470ee4bac883ba,e180c7424cb9c68cb49f141b092a988f,4ee5200e89d9e2998ec8baad8a3c5968,25,Teacher-Led,Stand Up to Bullying: Together We Can!,Did you know that 1-7 students in grades K-12 ...,Did you know that 1-7 students in grades K-12 ...,"My students need 25 copies of ""Bullying in Sch...",Applied Learning,"Character Education, Early Development",Grades PreK-2,Technology,361.8,2013-01-01,2013-05-30,Fully Funded,2013-01-11
1,2,afd99a01739ad5557b51b1ba0174e832,1287f5128b1f36bf8434e5705a7cc04d,6c5bd0d4f20547a001628aefd71de89e,1,Teacher-Led,Help Second Grade ESL Students Develop Languag...,Visiting or moving to a new place can be very ...,Visiting or moving to a new place can be very ...,My students need beginning vocabulary audio ca...,Literacy & Language,ESL,Grades PreK-2,Supplies,435.92,2013-01-01,2013-05-30,Fully Funded,2013-05-22
2,3,c614a38bb1a5e68e2ae6ad9d94bb2492,900fec9cd7a3188acbc90586a09584ef,8ed6f8181d092a8f4c008b18d18e54ad,40,Teacher-Led,Help Bilingual Students Strengthen Reading Com...,Students at our school are still working hard ...,Students at our school are still working hard ...,My students need one copy of each book in The ...,Literacy & Language,"ESL, Literacy",Grades 3-5,Books,161.26,2013-01-01,2013-05-31,Fully Funded,2013-02-06
3,4,ec82a697fab916c0db0cdad746338df9,3b200e7fe3e6dde3c169c02e5fb5ae86,893173d62775f8be7c30bf4220ad0c33,2,Teacher-Led,Help Us Make Each Minute Count!,"""Idle hands"" were something that Issac Watts s...","""Idle hands"" were something that Issac Watts s...","My students need items such as Velcro, two pou...",Special Needs,Special Needs,Grades 3-5,Supplies,264.19,2013-01-01,2013-05-30,Fully Funded,2013-01-01
4,5,563958074d7b12b48b939279eb59e6ca,b79a19772090efccde93b3a5934d829f,5ef1793ff657860ca7856d475715ec2a,4,Teacher-Led,It's about Time... Time for Kids!,We know that success in school is directly rel...,We know that success in school is directly rel...,My students need 24 subscriptions to Time for ...,"Literacy & Language, History & Civics","Literacy, Social Sciences",Grades 3-5,Other,175.15,2013-01-01,2013-05-31,Fully Funded,2013-02-01


Combine the two text features into a new variable.

In [11]:
text = ['Project Title','Project Essay']
for col in text:
    project_df[col] = project_df[col].astype(str).fillna('')
    project_df[col] = project_df[col].str.lower()

In [12]:
project_df['text'] = project_df['Project Title'] +" "+ project_df['Project Essay']

# 3. Initial content-based recommendation system
Below is a simple content-based recommendation system which uses Term Frequency - Inverse Document Frequency (TF-IDF) to parse through the new text feature created above. Term frequency is where words such as 'Books' within a project's text feature are counted and divided by the total number of words in the text feature. Inverse Document Frequency is a logarithmic function of total number of documents are divided by the number of classroom projects that contain the words 'Books'. More information can be found in [sklearn documentation section 4.2.3.4](http://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction)


After calculating the TF-IDF, Cosine Similarity matrix is computed which is a method to find similar classroom projects from TF-IDF. This system obtains the latest donation made by the donor, and is provided a new recommendation based on the Similarity Matrix. 

In [13]:
class ContentRecSystem:
    """
    A simple content-based recommendation system
    
    """
    def __init__(self):
        self.tfidf_matrix = None
        
    def _compute_tfidf(self,text):
        """
         Creates tf-idf matrix
        
        :param text: text feature used for tf-idf computation
        :return: tf-idf matrix
        """
        tf = TfidfVectorizer(strip_accents='unicode',
                         analyzer='word',
                         ngram_range=(1, 3),
                         max_df=0.9,
                         lowercase=True,
                         max_features=5000,
                         stop_words='english')
        
        self.tfidf_matrix = tf.fit_transform(text)
        
        
    def _get_donor_latest_donation_project_id(self, combine_df, user_id):
        """
         Obtains the latest donation made by the donor
        
        :param combine_df: main dataframe
        :param user_id: donor ID
        :return: project ID that donor last donated
        """
        latest_donation = combine_df[combine_df['Donor ID'] == user_id][-1:].values[0,0]
        return latest_donation
    
    
    def retrieve_similar_projects(self, combine_df, project_df, user_id):
        
        """
         Obtains the latest donation made by the donor
        
        :param combine_df: main dataframe
        :param project_df: projects dataframe
        :param user_id: donor ID
        :return: A list of tuples. Tuples contain cosine similarity value and project ID recommended by the system
        """

        # obtain project_id from last donation
        project_id = self._get_donor_latest_donation_project_id(combine_df, user_id)
        
        # compute tfidf not stored in class
        if self.tfidf_matrix == None:
            text = project_df['text']
            self._compute_tfidf(text)
        
        # linear_kernel provides the same result as cosine_similarity since tfidf_matrix by tfidf_matrix is used. 
        # It is slightly faster
        cosine_similarities = linear_kernel(self.tfidf_matrix, self.tfidf_matrix)
        
        pj_index = project_df[project_df['Project ID'] == project_id].index
        
        # obtain top 10 cosine similar projects to last project
        similar_indices = cosine_similarities[pj_index].argsort().flatten()[:-11:-1]
        
        # pair cosine similarity with project id into a list
        recommend_projects = [(cosine_similarities[pj_index].flatten()[i], \
                               project_df['Project ID'][i]) for i in similar_indices][1:]
           
        return recommend_projects   

From the model below, we obtain a list of projects suggested. However, we should further extend this model to utilize past donations instead of just the latest donations. 

In [14]:
content_model = ContentRecSystem()
content_model.retrieve_similar_projects(combine_df,project_df, '01afd72c41476784417b9479646876f8')

[(0.4137130158890267, '48eb8af6e730d25389659e9a074c6f97'),
 (0.37326512658196076, 'ca8a1d7e5ac6489eb741c852ca489376'),
 (0.36926618980403975, 'f24ebbebe377d16183b6e1164b458e52'),
 (0.3629362304904644, '1d24755c1f5c3a39c65d9c90ed3988b3'),
 (0.35850924216587665, '7e841b3308cb5472ba49d57a82f93e2b'),
 (0.35802521666602266, '57148dd44d084568e8980be78665f018'),
 (0.3494154530664604, '12a8954cc9c341c62a099ff966e3797d'),
 (0.346798626575864, '77477fe9d07ec1bc14da4fc8e385570e'),
 (0.34313339860722164, 'f7d61ecd29d0e05a17b09e4d2906a8bf')]

# 4. Updated content-based recommendation system
Below is an extended content-based recommendation system does the same as the simple content-based recommendation system shown above, except it collects all past donations and builds a donor profile which the system recommends to the user. The model's user profile is influenced by the weighting system, where the most recent projects the donor has donated towards have a greater effect on the user profile while earlier donations have a lesser effect on the user profile. 

In [15]:
class UpdatedContentRecSystem:
    """
    A updated content-based recommendation system with user profiles
    
    """
    def __init__(self):
        self.tfidf_matrix = None
        
    def _compute_tfidf(self,text):
        """
         Creates tf-idf matrix
        
        :param text: text feature used for tf-idf computation
        :return: tf-idf matrix
        """
        tf = TfidfVectorizer(strip_accents='unicode',
                         analyzer='word',
                         ngram_range=(1, 3),
                         max_df=0.9,
                         lowercase=True,
                         max_features=5000,
                         stop_words='english')
        
        self.tfidf_matrix = tf.fit_transform(text)
    
    
    def _compute_weighted_user(self, combine_df, user_id):
        """
         Creates user profile
        
        :param combine_df: main dataframe
        :param user_id: donor ID
        :return: An array of user profile from weighted projects previously donated 
        """
        # contains all project's tfidf 
        projects_list = []
        
        # A list of donor's unique projects
        donor_unique_df = self._get_donor_unique_df(combine_df)
        # A list of past projects user donated towards
        donor_projects = self._get_users_projects(donor_unique_df, user_id)
        
        # build a list of project's tfidf
        for project in donor_projects:
            pj_index = project_df[project_df['Project ID'] == project].index.values[0]
            projects_list.append(self.tfidf_matrix[pj_index:pj_index+1])
            
        # stack the arrays of tfidf
        projects_matrix = vstack(projects_list)
        
        # obtain weights for each project
        weights = self._create_weights(donor_unique_df, user_id)
        
        # multiply weights with project matrix divide by weights
        multiply = np.sum(projects_matrix.multiply(weights), axis=0) / np.sum(weights)
        
        # normalize using sklearn
        weighted_user = normalize(multiply)
        
        return weighted_user
        
        
    def _get_donor_latest_donation_project_id(self, combine_df, user_id):
        """
         Obtains the latest donation made by the donor
        
        :param combine_df: main dataframe
        :param user_id: donor ID
        :return: project ID that donor last donated
        """
        
        latest_donation = combine_df[combine_df['Donor ID'] == user_id][-1:].values[0,0]
        return latest_donation
    
    
    def _create_weights(self, donor_unique_df, user_id):
        """
         Creates a list of weights. Most recent projects have the greatest influence on user profile
        
        :param donor_unique_df: donor's unique projects dataframe
        :param user_id: donor ID
        :return: weights for user profile's projects
        """
        # the most recent project has the highest weight
        weights = [1]
        
        # builds a list of weights that decrease over time
        for i in range(len(donor_unique_df.loc[user_id])-1):
            weights.append(weights[i] * 1/2)
        return np.array(weights).reshape(-1,1)
    
    def _get_donor_unique_df(self, combine_df):
        """
         Wrangles the data to provide donor's unique projects
        
        :param combine_df: main dataframe
        :return: donor's unique projects dataframe
        """
        
        donor_unique_projects = combine_df.groupby(['Donor ID', 'Project ID'])['Donation Amount'].sum().reset_index()
        donor_unique_df = donor_unique_projects[donor_unique_projects['Project ID'].isin(project_df['Project ID'])]\
        .set_index('Donor ID')
        
        return donor_unique_df
    
    def _get_users_projects(self, donor_unique_df, user_id):
        """
         Obtain donor's unique donated projects
        
        :param donor_unique_df: donor's unique projects dataframe
        :param user_id: donor ID
        :return: donor's list of projects
        """
    
        donor_projects = donor_unique_df.loc[user_id]
        return donor_projects['Project ID']
    
    def retrieve_similar_projects(self, combine_df, project_df, user_id):
        """
         Obtains the latest donation made by the donor
        
        :param combine_df: main dataframe
        :param project_df: projects dataframe
        :param user_id: donor ID
        :return: A list of tuples. Tuples contain cosine similarity value and project ID recommended by the system
        """
        
        if self.tfidf_matrix == None:
            text = project_df['text']
            self._compute_tfidf(text)
        
        user_weight = self._compute_weighted_user(combine_df, user_id)
        
        cosine_similarities = cosine_similarity(user_weight, self.tfidf_matrix)
        
        similar_indices = cosine_similarities.argsort().flatten()[:-101:-1]
        recommend_projects = [(project_df['Project ID'][i], cosine_similarities[0,i]) for i in similar_indices]
        
        donor_unique_df = self._get_donor_unique_df(combine_df)
        
        project_list = list(self._get_users_projects(donor_unique_df,user_id))
        
        filtered_recommend_projects = [i for i in recommend_projects if i[0] not in project_list]
        
        return filtered_recommend_projects[:10]      

From the results below, we obtained a slightly different set of recommended projects due to our constructed user profile. 

In [16]:
ucontent_model = UpdatedContentRecSystem()
ucontent_model.retrieve_similar_projects(combine_df,project_df, '01afd72c41476784417b9479646876f8')

[('48eb8af6e730d25389659e9a074c6f97', 0.35353551327282295),
 ('1d24755c1f5c3a39c65d9c90ed3988b3', 0.34135566368564324),
 ('f24ebbebe377d16183b6e1164b458e52', 0.34021887212633994),
 ('ca8a1d7e5ac6489eb741c852ca489376', 0.3365361363226553),
 ('82b016127576da540a80acb139f54b15', 0.3226863058271395),
 ('57148dd44d084568e8980be78665f018', 0.3212905921735998),
 ('7e841b3308cb5472ba49d57a82f93e2b', 0.3181711300428763),
 ('12a8954cc9c341c62a099ff966e3797d', 0.3118784666746285),
 ('5bc0f656a05088ca43cd6eac7bcd102c', 0.31183864683811974),
 ('19f69113b264c65a56774a655203289c', 0.3112840995279238)]

# 5. Evaluation

Now we need to quantify how well our model performs. Due to time series nature of the dataset, I have choosen to implement a walk forward approach to model testing. For example, if a person has only made 5 unique donations over the period of the dataset, the model will start using their first donation history to provide a recommendation on their second. Afterwards, the third recommendation is support by the previous two donations, and so forth. 

For each recommendation, a list of 10 projects are recommended and if the next actual donation appear in the list, the model has successfully offered a good recommendation list. This is called Top-N recommendation (where N is 10 in our case). We will calculate the recall@10 at each prediction interval (e.g. first donation history recommend/predict the second donation, etc).  

In addition to recall metric, I compute the number of correct recommendations for all intervals, which is the accuracy metric. 

Note: Precision is not valid for this model as there are no 'bad' donations, as there is no threshold for a good donation. 

The model below is adjusted to incorporate the testing.

In [17]:
class ContentRecSystemWeightsTest:
    """
    A test content-based recommendation system
    """
    def __init__(self):
        self.tfidf_matrix = None
        
    def _compute_tfidf(self,text):
        """
         Creates tf-idf matrix
        
        :param text: text feature used for tf-idf computation
        :return: tf-idf matrix
        """
        tf = TfidfVectorizer(strip_accents='unicode',
                         analyzer='word',
                         ngram_range=(1, 3),
                         max_df=0.9,
                         lowercase=True,
                         max_features=5000,
                         stop_words='english')
        
        self.tfidf_matrix = tf.fit_transform(text)
    
    
    def _compute_weighted_user(self, combine_df, user_id, project_k):
        """
         Creates user profile
        
        :param combine_df: main dataframe
        :param user_id: donor ID
        :param project_k: filter donors with k number of unique projects
        :return: An array of user profile from weighted projects previously donated 
        """
        # contains all project's tfidf 
        projects_list = []
        
        # A list of donor's unique projects
        donor_unique_df = self._get_donor_unique_df(combine_df)
        # A list of past projects user donated towards
        donor_projects = self._get_users_projects(donor_unique_df, user_id, project_k)
        
        # build a list of project's tfidf
        for project in donor_projects:
            pj_index = project_df[project_df['Project ID'] == project].index.values[0]
            projects_list.append(self.tfidf_matrix[pj_index:pj_index+1])
            
        # stack the arrays of tfidf
        projects_matrix = vstack(projects_list)
        
        # obtain weights for each project
        weights = self._create_weights(donor_unique_df, user_id, project_k)
        
        # multiply weights with project matrix divide by weights
        multiply = np.sum(projects_matrix.multiply(weights), axis=0) / np.sum(weights)
        
        # normalize using sklearn
        weighted_user = normalize(multiply)
        
        return weighted_user
    
    
    def _get_donor_latest_donation_project_id(self, combine_df, user_id):
        """
         Obtains the latest donation made by the donor
        
        :param combine_df: main dataframe
        :param user_id: donor ID
        :return: project ID that donor last donated
        """
        
        latest_donation = combine_df[combine_df['Donor ID'] == user_id][-1:].values[0,0]
        return latest_donation
        
        
    def _create_weights(self, donor_unique_df, user_id, project_k):
        """
         Creates a list of weights. Most recent projects have the greatest influence on user profile
        
        :param donor_unique_df: donor's unique projects dataframe
        :param user_id: donor ID
        :param project_k: filter donors with k number of unique projects
        :return: weights for user profile's projects
        """
        # the most recent project has the highest weight
        weights = [1]
        
        # builds a list of weights that decrease over time
        for i in range(len(donor_unique_df.loc[user_id])-1):
            weights.append(weights[i] * 1)
        return np.array(weights[:project_k]).reshape(-1,1)
    
    
    def _get_donor_unique_df(self, combine_df):
        """
         Wrangles the data to provide donor's unique projects
        
        :param combine_df: main dataframe
        :return: donor's unique projects dataframe
        """
        
        donor_unique_projects = combine_df.groupby(['Donor ID', 'Project ID'])['Donation Amount'].sum().reset_index()
        donor_unique_df = donor_unique_projects[donor_unique_projects['Project ID'].isin(project_df['Project ID'])]\
        .set_index('Donor ID')
        
        return donor_unique_df
    
    
    def _get_users_projects(self, donor_unique_df, user_id, project_k=1):
        """
         Obtain donor's unique donated projects
        
        :param donor_unique_df: donor's unique projects dataframe
        :param user_id: donor ID
        :param project_k: filter donors with k number of unique projects
        :return: donor's list of projects
        """
    
        donor_projects = donor_unique_df.loc[user_id]
        return donor_projects['Project ID'][:project_k]
    
    
    def retrieve_similar_projects(self, combine_df, project_df, user_id, project_k):
        """
         Obtains the latest donation made by the donor
        
        :param combine_df: main dataframe
        :param project_df: projects dataframe
        :param user_id: donor ID
        :param project_k: filter donors with k number of unique projects
        :return: A list of tuples. Tuples contain cosine similarity value and project ID recommended by the system
        """
        # obtain project_id from last donation
        if self.tfidf_matrix == None:
            text = project_df['text']
            self._compute_tfidf(text)
        
        user_weight = self._compute_weighted_user(combine_df, user_id, project_k)
        
        cosine_similarities = cosine_similarity(user_weight, self.tfidf_matrix)
        similar_indices = cosine_similarities.argsort().flatten()[:-101:-1]
        recommend_projects = [(project_df['Project ID'][i], cosine_similarities[0,i]) for i in similar_indices]
        
        donor_unique_df = self._get_donor_unique_df(combine_df)
        project_list = list(self._get_users_projects(donor_unique_df,user_id, project_k))
     
        filtered_recommend_projects = [i for i in recommend_projects if i[0] not in project_list ]
        
        return filtered_recommend_projects[:10]    

In the model evaluation, we will look at users who have made a maxmimum of 5 unique donations. It will return an array of recall@10 for 4 recommendation/prediction intervals and an overall accuracy score.  

In [18]:
class ModelEvaluation:
    """
    A class to test the recommendation model
    """
    def _get_donors_list(self, combine_df, max_project=5):
        """
         Obtains the list of donors who have unique k donations 
        
        :param combine_df: main dataframe
        :param max_project: unique k donations 
        :return: list of unique donors
        """
        test = combine_df.groupby(['Donor ID','Project ID']).sum()
        test_data = test.groupby(by='Donor ID').filter(lambda x: len(x) == max_project)
        print("Number of test users:", len(set(test_data.reset_index()['Donor ID'])))
        return set(test_data.reset_index()['Donor ID'])
    
    def evaluate_user(self, model, combine_df, project_df, user_id, max_project = 5):
        """
         Evaluates the model for a user
        
        :param combine_df: main dataframe
        :param project_df: projects dataframe
        :param user_id: donor ID
        :param max_project: unique k donations  
        :return: list of scores indicating right/wrong recommendation
        """
        # stores the scores of recommendations for each prediction iterval
        score_list = []
        
        donor_unique_df = model._get_donor_unique_df(combine_df)
        full_projects = model._get_users_projects(donor_unique_df, user_id, max_project)
        
        # Verifies the recommendation/prediction to actual donation for each interval
        for project_k in range(1, max_project):
            recommendation_list = model.retrieve_similar_projects(combine_df, project_df, user_id, project_k)
            score = 0
            
            for project in recommendation_list:
                if full_projects.iloc[project_k] == project[0]:
                    score = 1
            score_list.append(score)
                      
        return score_list
    
    
    def evaluate_model(self, model, combine_df, project_df, max_project = 5):
        """
         Evaluates the model for each user
        
        :param model: Recommendation System model
        :param combine_df: main dataframe
        :param project_df: projects dataframe
        :param max_project: unique k donations  
        :return: recall and accuracy metric
        """
        score_array_list = []
        
        # obtain the list of users who have k number of unique donations
        users_list = self._get_donors_list(combine_df, max_project)
        
        # Process all users
        for user_id in users_list:
            array = self.evaluate_user(model, combine_df, project_df, user_id, max_project)
            score_array_list.append(coo_matrix(array))
        
        # Create a score matrix
        score_matrix = vstack(score_array_list)
        
        recall = np.sum(score_matrix, axis=0) / len(users_list)
        accuracy = np.sum(np.sum(score_matrix, axis=1), axis=0) / (len(users_list) * max_project-1)
        
        return recall, accuracy

In [19]:
test_model = ContentRecSystemWeightsTest()
eval_model = ModelEvaluation()
recall, accuracy = eval_model.evaluate_model(test_model, combine_df, project_df, max_project=5)

Number of test users: 105


From the evaluation model above, the results show the recall metric increasing as the model builds a user profile over time. Initally, the model only obtains a recall of less than 10%. Once the model has 4 past donations to build a user profile, it successfully recommended 25.7% correct projects to the donor. 

In [20]:
recall

matrix([[0.0952381 , 0.16190476, 0.17142857, 0.25714286]])

In [21]:
accuracy

matrix([[0.13740458]])

After tweaking the weighting system (which influences how past donations on the donor's profile), it seems donations made earlier continue to play an important role in future donations. In the end, it seems a uniform weighting system for all donations resulted in the highest recall and accuracy metric from the evaluation model. 

# 6. Future Improvements

Recommendation systems are not limited to the content-based approach. Collaborative-based approach and a hybridization of content-based and collaborative-based models may help provide better recommendations. 

Reasons for choosing to build a content-based model is the ['cold-start' problem](http://blog.untrod.com/2016/06/simple-similar-products-recommendation-engine-in-python.html) collaborative-based approach models suffer from, where new donor's have limited information for collaborative-based approach models to successfully recommend 'correct' projects. Since a majority of donors only donated a few times it makes perfect sense to start with a content-based model to encourage these donors to donate based on their profile. 

The model can be improved by:
- Creating a hybrid model of content-based and collaborative-based model, which may help provide better recommendations for regular donors.
- Make adjustments to the code for production. Since computations are expensive, user profile/data should be stored into a database.
- Incorporate non-text information into the model.