# Consumer Complaint Identification

_Every consumer care department gets thousands of complaints every day_ 

_The department needs to identify the nature of complaints to act on the most important ones first_

_The Dataset is based on the consumer complaints collected by Consumer Financial Protection Bureau_

_Features in the dataset:_

* Date received: The date on which the complaint received
* Product: Type of product in which the consumer identified complaint
* Sub-product: Type of sub-product in which the consumer identified complaint
* Issue: The issue reported by the consumer
* Sub-issue: The sub-issue reported by the consumer
* Consumer complaint narrative: Complete description of the complaint reported by the consumer

_GOAL:_

* To identify the category of the complaint filed by the consumer 
* To identify the most important issues to be addressed first 

_The Product column contains the name of the product where the consumer found issues_
_So, the Product column is the target variable in this classification problem_

In [1]:
# importing the necessary libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import SGDClassifier
import nltk
import re
from nltk.corpus import stopwords
import string

In [2]:
# Importing the dataset
data = pd.read_csv("C:\\Users\\kumar\\OneDrive\\Desktop\\Projects\\Python project\\Consumer Complaint Classification\\consumer complaints dataset.csv")
print(data.head())

   Unnamed: 0 Date received  \
0           0    2022-11-11   
1           1    2022-11-23   
2           2    2022-11-16   
3           3    2022-11-15   
4           4    2022-11-07   

                                             Product  \
0                                           Mortgage   
1  Credit reporting, credit repair services, or o...   
2                                           Mortgage   
3                        Checking or savings account   
4                                           Mortgage   

                  Sub-product                           Issue  \
0  Conventional home mortgage  Trouble during payment process   
1            Credit reporting     Improper use of your report   
2                 VA mortgage  Trouble during payment process   
3            Checking account             Managing an account   
4      Other type of mortgage  Trouble during payment process   

                                       Sub-issue  \
0                                

Dataset contains 1 unnamed column

In [3]:
# Removing that unnamed column
data = data.drop("Unnamed: 0",axis=1)

In [4]:
# Checking whether there are any null values in the dataset
print(data.isnull().sum())

Date received                         0
Product                               0
Sub-product                      235294
Issue                                 0
Sub-issue                        683355
Consumer complaint narrative    1987977
dtype: int64


As we can see, there so many null values within our dataset

In [5]:
# Removing the null values
data = data.dropna()

product column in the dataset contains the labels which represent the nature of the complaints reported by the consumers

In [6]:
# Viewing all labels and their frequencies
print(data["Product"].value_counts())

Credit reporting, credit repair services, or other personal consumer reports    507582
Debt collection                                                                 192045
Credit card or prepaid card                                                      80410
Checking or savings account                                                      54192
Student loan                                                                     32697
Vehicle loan or lease                                                            19874
Payday loan, title loan, or personal loan                                         1008
Name: Product, dtype: int64


As we can see, most of the complaints are regarding "Credit reporting, credit repair services, or other personal consumer reports" and then "Debt collection" while the lowest number of complaints are regarding "Payday loan, title loan, or personal loan"

# Training Consumer Complaint Classification Model

The "consumer complaint narrative" column contains the complete description of the complaints reported by the consumer

In [None]:
# cleaning and preparing this column before using it in a Machine Learning model
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
stopword = set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["Consumer complaint narrative"] = data["Consumer complaint narrative"].apply(clean)

In [7]:
# splitting the data into training and test sets
data = data[["Consumer complaint narrative", "Product"]]
x = np.array(data["Consumer complaint narrative"])
y = np.array(data["Product"])

cv = CountVectorizer()
X = cv.fit_transform(x)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.33, 
                                                    random_state=42)

In [8]:
# Training the Machine Learning model using the Stochastic Gradient Descent classification algorithm
sgdmodel = SGDClassifier()
sgdmodel.fit(X_train,y_train) 

In [10]:
# Using the trained model to make predictions
user = input("Enter a Text: ")
data = cv.transform([user]).toarray()
output = sgdmodel.predict(data)
print("Consumer Complaint Classification Type = ",output)

Enter a Text: Investigation took more than 30 days and nothing was changed when clearly there are misleading, incorrect, inaccurate items on my credit report..i have those two accounts attached showing those inaccuracies... I need them to follow the law because this is a violation of my rights!! The EVIDENCE IS IN BLACK AND WHITE ....
Consumer Complaint Classification Type =  ['Credit reporting, credit repair services, or other personal consumer reports']
