<h1 align="center"> Project Proposal </h1>
<h2 align="center"> Consumer Financial Protection Bureau Data Analytics </h2>
<h3 align="center"> IST 5520: Data Science and Machine Learning with Python </h3>
<h3 align="center"> By: Group 2 </h3>
<h3 align="center"> Date: 9/26/2022 </h3>

# I. Introduction

The U.S. federal government employs several tools and systems to manage and assist the economy. The federal funds rate is among the main tools employed by the federal government. These rates have an impact on monetary and financial circumstances, which in turn affect crucial components of the larger economy such as employment, growth, and inflation. Consequently, via the economic impacts (Chen, 2022), the federal funds rate also affects the banks prime lending rates; the rates banks charge their most creditworthy borrowers, hence affecting virtually everything from car to home loans.

In 2007, a liberal market of cheap credit and lending created one of the world’s worst financial crises. Post the September 11, 2001 terrorist attacks, and aimed at boosting the economy and business activities, the Federal Reserve lowered one of the most important interest rates in the US economy, the federal funds rate, from 6.51% in October 2000 to a 0.98% in December 2003 (FRED, 2003).

Decreasing the federal funds rate led to an increase in monetary liquidity with banks to spur economic activity and trades. As a result, people were able to make use of low interest; and even the subprime borrowers (people with credit risk) were able to realize their dream of owning a house (Singh, 2022). As a result of low mortgage rates, house prices rose rather quickly (U.S. Federal Housing Finance Agency, 2022).

Eventually, home ownership among the population reached a peak of 69.2% (U.S. Census Bureau, 2022), and as a result, house prices went down. This meant that houses were worth less than the loans taken to purchase them. This was exacerbated with the rate-adjustable mortgages, which left subprime borrowers unable to pay their debts.

Consequently, several subprime lender organizations went insolvent and started filing for bankruptcy (U.S. Securities and Exchange Commission, 2007). As a result of asset-price degradation, major world financial institutions such as Swiss bank UBS and the Bank of England reported losses in the trillion of dollars (Hanson & Essenburg, 2014). With further major stock markets and over 500 financial institutions liquidated, the U.S. economy was hit with a great recession.

In order to restore confidence in the economy, the U.S. Congress passed the Dodd-Frank Wall Street Reform and Consumer Protection Act in 2010. On the financial front, the legislation curtailed some of the largest banks' riskier activities, enhanced government control of their operations, and required them to have higher cash reserves. It tried to decrease predatory lending on the consumer side (Govinfo.gov, n.d.).

This "Watchdog" service provided a platform for consumers to raise their concerns, allowing the government to intervene to help address concerns raised by individual consumers, standardize mortgage paperwork, and protect the mortgage industry and the economy; specifically, prosecuting large banks for defrauding or manipulating customers; and limiting predatory lenders (Cordray, 2020).

The Consumer Financial Protection Bureau (CFPB) complaints data, when applied to analytics, can help expose areas of harmful communication tactics followed by financial institutions, inaccurate disclosures to customers, or discriminatory lending practices (Booker, 2015). Starting from this, this project aims at analyzing the dataset available to practice identifying the most common financial issues reported, trends in the reported issues, and highlight any financial fraudulent behaviors occurring. 


# II. Literature Review

The value a consumer obtains from a particular product or service is referred to as service quality (Pevec & Pisnik, 2018). Customer happiness and willingness are related to service quality (School et al., 2016). As a result, contentment is a huge factor in determining the service quality of a company. Through client loyalty, it was discovered that service quality and customer happiness are related to repurchase intentions (Upamannyu, Gulati, and Chack (2015); Fida et al., 2020).

Perceived Service Quality (PSQ) is claimed to be the cornerstone to all servicing firms' existence (Purcarea, 2016; Bobocea et al., 2016 ). PSQ has a significant impact on customer satisfaction (Purcarea, 2016), and customer satisfaction is an antecedent for loyalty. As a result, while assessing the provider's service quality, customer satisfaction is indispensable. As a result, complaint management is seen as an essential component in determining the provider's service quality (Bogale, Kassa, & Ali 2015).

Complaint management is required to improve the service firm's product and service quality (Ahmed & Amir, 2011). The researchers examined the impact of PSQ and patient complaint (PC) on patient satisfaction (PS). The study's findings demonstrated that PSQ had a good influence on PC and PS. This indicates that PC is a harmonizing factor in the PSQ-PC relationship. This suggests that supervisors should view and consider complaint handling as a mediating function in order to improve service quality and satisfaction in order to keep customer loyalty. Putting an emphasis on perceived quality and customer complaints increases loyalty and repurchase intent (Nguyen et al., 2022).

Consumer complaints and managerial responses were first addressed by Resnik & Harmon (1983). In the authors’ view, consumer satisfaction is one of the most important attributes to our economic system.  When confronted with a complaint, a marketing manager must “define and clarify company attitudes toward complaint handling” and “establish procedures for implementing company attitudes.” The manager must balance consumer satisfaction with company policy. The authors noted that most research up until that point focused on the complainant and often varied in its findings. The paper attempted to answer how the complaint should be handled and what part of the organization is responsible for it.

The methodology of the study was as follows: In the first phase, random consumers read one of five complaint letters written by the paper’s authors. Random customers read one of five complaint letters submitted by the paper's authors in the first phase. The customers were invited to provide a suitable response. They were also asked who they believed should respond from the firm. The letters were addressed to an unnamed Fortune 500 firm that produces building materials such as paneling and plywood. The second phase investigated the response from the branch managers. “The branch managers were asked to provide information about company objectives and policies, the employee designated to respond, perceived complainant motivations and the legitimacy of the complaint.” The authors then compared these responses to what the consumers said they wanted.

The study found that consumers found the letters about as or slightly more legitimate than managers. However, one of the vaguely-worded letters was more controversial; consumers substantially found the letter to be legitimate more often than managers.

When managers believed a complaint was not legitimate, the top three reasons were: 1) “consumers wanted something for nothing”, 2) “consumer confusion” and 3) “consumer incorrectly believed he/she was right”.
Both customers and managers were generally hopeful about the reaction to the complaint, with managers citing customer happiness and company protection as their top two goals.

Managers overwhelmingly agreed that complaints should be dealt with at the level of organization closest to the problem, while customers did not have a clear preference, either owing to ignorance of the firm's structure or a lack of concern.
Managers tended to prefer personal contact, but most customers thought a letter sufficed. Managers were also willing to respond more aggressively than customers expected.

This pilot study in the consumer reporting field brought a lot of insights based upon which much research work was built. Importantly, this study provided case studies in which findings may be skewed because the customers and managers questioned did not have the same emotional connection as actual consumers and managers would have because the letters and scenarios were fictitious, lowering consumers' expectations. Customers who had previously written legitimate complaints were generally more forgiving of answers than those who had not, indicating that experienced ones may have more realistic expectations than new consumers. Experienced managers were also more likely to provide favorable reactions to customers.

Twenty-five years in the future, spurred by sociopolitical and economic considerations, with the use of technology elements provided by computers, the CFPB identifies itself as a "21st-century agency" that is "data-driven," with technology being "critical to the CFPB's mission" (CFPB, 2013; Cordray, 2014). The CFPB's strategy plan indicates that it intends to "use data purposefully, to analyze and distill information to enable informed decision-making in all internal and external tasks" (CFPB, 2013).

Importantly, reporting venues such as the CFPB are “making it easier to be heard“ (Shammout & Haddad, 2014). The Consumer Financial Protection Bureau's website has made it exceedingly simple to make a complaint against a bank; with no longer the need for customers to contact and traverse complicated automated systems, additionally providing clients with a written record of who said what and when, which may be useful if the problem is not handled on the first try (Shammout & Haddad, 2014).

The CFPB's director, Richard Cordray, has also emphasized the agency's emphasis on data in its rulemakings, asserting: "Before we finalize our mies, we conduct research and solicit input from all stakeholders – consumer  advocates, industry members, and public officials. The best decisions will be those that are best informed" (Cordray, 2013).

Subsequently, the bureau has indeed worked on gathering data to make well-informed analyses to policy makers, rulemaking, and the public (Horn, 2017). CFPB now entails over 10.7 million consumers’ credit information for 25-75 credit card accounts. Additionally, the bureau collects over 700,000 monthly automobile sales data from 46 states and compares them with the credit information (Government Accountability Office, 2014). 

This information is used by the CFPB to monitor risk in financial markets, assess risk at firms, and prioritize agency action. The CFPB makes complaint data and analytics available to CFPB employees to assist with supervisory, enforcement, and market monitoring efforts. The CFPB also makes complaint data available to other federal and state authorities, as well as the general public.

Additionally, Companies can also utilize the available complaint information to learn more about their firm, rivals, and the industry as a whole. Consumer complaints can be a sign of probable risk management flaws or other problems, such as legislation or regulatory infractions. Complaints might expose a flaw in a certain product, service, function, department, or vendor. Complaints can also be used to find opportunities to improve consumers' experiences with and comprehension of consumer finance goods and services.


# III. Research Questions

## The dataset will allow us to explore the following research questions:

### 1. Time-phased spike in complaints:

As customer complaints are documented against the submission date, it will be possible to investigate general patterns in consumer complaints, whether they are rising or decreasing over time, and whether there are clear time-related trends among special complaints. This might also be applied to the reporting firms, with the increase/decrease in complaints against them tracked over time. The former focuses on the economy's financial well-being, whereas the latter reflects on the financial strength and good conduct of enterprises.

### 2. How common are certain customer complaints and what is the distribution of complaints:

Similarly to the time-phased trend analysis, we will examine the most often handled complaints and the distribution of complaints overall and by financial company. The former would provide an overview of concerns that are most likely tied to the larger economy, whilst the latter would represent undertakings in the respective firms.

### 3. What are the response times in companies:

The focus of our dataset is on whether organizations respond within 15 days and the distribution of current statuses. We want to provide a visual representation of category factors like closed and ongoing reports, as well as monetary and non-monetary reliefs. The graphical depiction of this dataset will assist us in identifying trends and patterns, as well as highlighting any obvious outliers. This dataset presents the potential of exploring whether specific complaints are addressed first, unlike other ones that possibly require more time.

### 4. Trends and categories in complaints:

Because of the large scope of our dataset, it would be hard to evaluate each and every complaint separately. The dataset's categories are a key point of differentiation which we could utilize to identify patterns. If a particular group of complaints substantially overlaps with, for instance, a zip code or collection of zip codes, we may be able to make inferences that we would not have been able to reach otherwise. If a certain organization is getting a higher-than-average volume of complaints of a certain category, we would be able to observe that as well.

### 5. Rural vs urban distribution of customer complaints via zipcodes:

We will study client data to learn more about their behavior. Data collection and organization are critical steps in analyzing and comprehending information and making business choices. Companies that employ consumer analytics extensively might generate increased returns on investment (ROI) and profit. The consumer complaint dataset includes the consumer's state and zip code, which allows us to sort and analyze this structured dataset. Storing client data is critical, and using zip codes as a demographic baseline allows us to gain a better understanding of our analyses. 

The policy makers should be concerned if complaints are disproportionately coming from traditionally underserved communities. Our analysis will focus on whether the complaints predominantly come from minority communities and what is the ethnicity of those prevailing communities. Language limitations and/or immigrant status may lead to lower complaint rates in mostly Hispanic zip codes than in predominantly African-American zip areas. We are attempting to determine which communities are more likely to file complaints and if they are more or less likely to be addressed.


### 6. Median income of customer complaints via zipcode:

In addition to rural vs urban zip codes, mean income could also be a differentiating factor in what complaints are filed and how they are handled. Individuals and households with lesser earnings may have different difficulties and requirements than those with greater incomes, making this an important point to measure. It is also possible that banks and companies may also prioritize customers with higher incomes, and it would be interesting to see if this is reflected in the data.

# IV. Data

According to the Consumer Financial Protection Bureau (CFPB), the customer complaints data collection spans 12 years (2011-2022), and a total of 1,048,575 consumer complaints was acquired from the CFPB online-public database in September 2022. The acquired dataset contains the consumer complaint's issue, which is classified by product (bank account, credit card, mortgage, etc.) and recorded against the complaint time, location (state and ZIP code), and financial institution in question. Furthermore, the dataset describes how the data was collected from the consumer, whether by phone, website, or referral, which may provide insight into the trend in submissions when compared to the nature of the complaint, its urgency, and location (whether rural with limited access to computers or urban...). Importantly, the complaints include a boolean answer (yes/no) of the concerned financial institution's prompt response within 15 days of the complaint filing. When the timely responses to the complaint themes are compared, it becomes clear which concerns take more time to be fixed properly. Importantly, if particular financial institutions are discovered to have a trend of protracted response time to specific kinds of complaints, while other institutions rather reply quickly to, we would be able to identify corporations that may be engaging in potentially fraudulent activities. The latter might be further exaggerated when evaluating the regional distribution of response time and raising the red flag of demographic-based fraud. Fraud victimization occurs in a variety of populations and differs in urban and rural locations throughout all 50 states. Payday loan, student debt relief, health care, and business opportunity scams are among the fraudulent operations. In its most recent poll, the Federal Trade Commission (FTC) discovered that around 16% of Americans are defrauded each year (Anderson, 2019). Authorities can reduce the frequency of fraud by identifying which categories of customers are more likely to be victims of particular types of fraud.

Furthermore, the data contains a Boolean element indicating if the customer contested the complaint further. Comparing this field to companies, locations, and issue categories would help us to better understand possibly fraudulent trends followed by certain firm branches, or just people from certain localities who have a tendency to debate their claims further.

Notably, the time-phased analysis of complaints is remarkably useful for detecting yearly and seasonal trends and peak times in complaints, as well as comparing any anomalies with external financial factors not reported in the dataset, such as the effect of the COVID pandemic on the economy and, as a result, on the companies' practices/consumers' complaints.


Data Dictionary: the comprehensive data dictionary is provided below:

In [2]:
# import pandas library
import pandas as pd
  
# dictionary with dictionary object
details = { 
    0 : {
        'Column Name' : 'Date received',
        'Description' : 'Date of complaint submission',
        'Data Type' : 'Float (mm/dd/yyyy)',
        'Example' : '12/1/2011',
        
        },
    1 : {
        'Column Name' : 'Product',
        'Description' : 'Product concerned in the complaint',
        'Data Type' : 'Float',
        'Example' : 'Mortgage',
        },
    2 : {
        'Column Name' : 'Sub-product',
        'Description' : 'Specific sub-product concerned',
        'Data Type' : 'Float',
        'Example' : 'FHA mortgage',
        },
    3 : {
        'Column Name' : 'Issue',
        'Description' : 'Specific issue with sub-product',
        'Data Type' : 'Float',
        'Example' : 'Loan modification',
        },
    4 : {
        'Column Name' : 'Sub-Issue',
        'Description' : 'Further details on issue',
        'Data Type' : 'Float',
        'Example' : 'Information belongs to someone else',
        },
    5 : {
        'Column Name' : 'Consumer complaint narrative',
        'Description' : 'Consumer\'s story on what happened',
        'Data Type' : 'Float',
        'Example' : 'Bank of America randomly closed my account. ',
        },
    6 : {
        'Column Name' : 'Company',
        'Description' : 'The concerned financial institution',
        'Data Type' : 'Float',
        'Example' : 'BANK OF AMERICA',
        },
    7 : {
        'Column Name' : 'State',
        'Description' : 'The state the consumer belongs to',
        'Data Type' : 'Float',
        'Example' : 'MO',
        },
    8 : {
        'Column Name' : 'ZIP Code',
        'Description' : 'ZIP Code of consumer\'s county',
        'Data Type' : 'Integer',
        'Example' : '60302',
        },
    9 : {
        'Column Name' : 'Tags',
        'Description' : '(if applicable) tags of the consumer',
        'Data Type' : 'Float',
        'Example' : 'Servicemember/Older American',
        },
    10 : {
        'Column Name' : 'Consumer consent provided?',
        'Description' : 'Whether consumer provided consent',
        'Data Type' : 'Float',
        'Example' : 'Consent provided',
        },
    11 : {
        'Column Name' : 'Submitted via',
        'Description' : 'Platform of complaint submittal',
        'Data Type' : 'Float',
        'Example' : 'Phone',
        },
    12 : {
        'Column Name' : 'Date sent to company',
        'Description' : 'Date on which CFPB sent complaint to the respective company',
        'Data Type' : 'Float (mm/dd/yyyy)',
        'Example' : '12/1/2011',
        },
    13 : {
        'Column Name' : 'Company response to consumer',
        'Description' : 'Line of action provided by company to consumer',
        'Data Type' : 'Float',
        'Example' : 'Closed without relief',
        },
    14 : {
        'Column Name' : 'Timely response?',
        'Description' : 'Timely response of the company within 15 days from sending complaint',
        'Data Type' : 'Boolean',
        'Example' : 'Yes/No',
        },
     15 : {
        'Column Name' : 'Consumer disputed?',
        'Description' : 'Whether consumer disputed the company\'s resolution',
        'Data Type' : 'Boolean',
        'Example' : 'Yes/No',
        },
     16 : {
        'Column Name' : 'Complaint ID',
        'Description' : 'Unique ID of the complaint',
        'Data Type' : 'Integer',
        'Example' : '2102',
        }
    
}
  
# creating a Dataframe object
# key is act as index value
# and column value is 0, 1, 2...
df = pd.DataFrame(details)
  
# swap the columns with indexes
df = df.transpose()
  
df

Unnamed: 0,Column Name,Description,Data Type,Example
0,Date received,Date of complaint submission,Float (mm/dd/yyyy),12/1/2011
1,Product,Product concerned in the complaint,Float,Mortgage
2,Sub-product,Specific sub-product concerned,Float,FHA mortgage
3,Issue,Specific issue with sub-product,Float,Loan modification
4,Sub-Issue,Further details on issue,Float,Information belongs to someone else
5,Consumer complaint narrative,Consumer's story on what happened,Float,Bank of America randomly closed my account.
6,Company,The concerned financial institution,Float,BANK OF AMERICA
7,State,The state the consumer belongs to,Float,MO
8,ZIP Code,ZIP Code of consumer's county,Integer,60302
9,Tags,(if applicable) tags of the consumer,Float,Servicemember/Older American


# V. Reference

Bogale, A. L., H. B. Kassa, and J. H. Ali. (2015). Patients’ perception and satisfaction on quality of laboratory ma- laria diagnostic service in Amhara Regional State, North West Ethiopia. Malaria Journal 14 (1):1–7. doi: 10.1186/ s12936-015-0756-6.

Booker, S. W. Mortgage servicing and debt collection among major issues according to EY analysis of consumer financial protection bureau (CFPB) complaint data: - analysis shows complaints increased monthly to more than 14,100 in 2015 from 9,022 in 2013. (2015, Nov 18). PR Newswire Retrieved from https://www.proquest.com/wire-feeds/mortgage-servicing-debt-collection-among-major/docview/1733899748/se-2

CFPB. (2013b). "Consumer Financial Protection Bureau Strategic Plan FY 2013 - FY 2017," http://www.consumerfinance.gov/ strategic-plan/

Chen, J. (2022). Federal funds rate: What it is, how it's determined, and why it's important. Investopedia. Retrieved September 26, 2022, from https://www.investopedia.com/terms/f/federalfundsrate.asp

Cordray, R. (2020). Watchdog: How protecting consumers can save our families, our economy, and our democracy. Oxford University Press.

Cordray, Richard. (2013). "Director Cordray Remarks at the American Mortgage Conference," Raleigh, NC (September 11), http://www.consumerfinance.gov/newsroom/director-cordray- remarks-at-the-american-mortgage-conference/

FRED. (2022). Federal funds effective rate. FRED. Retrieved September 21, 2022, from https://fred.stlouisfed.org/series/FEDFUNDS

Government Accountability Office. (2014). "Consumer Financial Protection Bureau: Some Privacy and Security Procedures for Data Collections Should Continue Being Enhanced," http://www.gao.gov/products/GAO-14-758

Govinfo.gov. (n.d.). Dodd-Frank Wall Street Reform and Consumer Protection Act. Retrieved September 24, 2022, from https://www.govinfo.gov/app/details/COMPS-9515

Hanson, L. K. &  Essenburg, T. J. (2014). The New Faces of American Poverty: A Reference Guide to the Great Recession, Page 18. ABC-CLIO.

Horn, R. (2017). Policy watch: The consumer financial protection bureau's consumer research: Mission accomplished? Journal of Public Policy & Marketing, 36(1), 170-183. https://doi.org/10.1509/jppm.17.037

Nguyen, T. L. (2022). Complaints management increasing perceived quality and satisfaction. Hospital Topics, 1–8. https://doi.org/10.1080/00185868.2022.2064788 

Pevec, T., and A. Pisnik. (2018). Empirical evaluation of a conceptual model for the perceived value of health ser- vices. Zdravstveno Varstvo 57 (4):175–82. doi: 10.2478/ sjph-2018-0022.

Purcarea, T. V. (2016). Creating the ideal patient experience. Journal of Medicine and Life 9 (4):380–5.

Raval, D. (2021). Who is victimized by fraud? Evidence from consumer protection cases. Journal of Consumer Policy, 44(1), 43-72.

Resnik, A. J., & Harmon, R. R. (1983). Consumer complaints and managerial response: A holistic approach. Journal of marketing, 47(1), 86-97.

Shammout, M. Z., & Haddad, S. I. (2014). The impact of complaints' handling on customers' satisfaction: empirical study on commercial banks' clients in Jordan. International Business Research, 7(11), 203.

U.S. Census Bureau. (2022). Homeownership Rate in the United States [RHORUSQ156N], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/RHORUSQ156N, September 25, 2022.

U.S. Federal Housing Finance Agency. (2022). All-Transactions House Price Index for the United States [USSTHPI]. FRED, Federal Reserve Bank of St. Louis. Retrieved September 21, 2022, from FRED; https://fred.stlouisfed.org/series/USSTHPI

U.S. Securities and Exchange Commission. "New Century Financial Corporation Files for Chapter 11; Announces Agreement to Sell Servicing Operations https://www.sec.gov/Archives/edgar/data/1287286/000129993307002129/exhibit1.htm

Upamannyu, N. K., C. Gulati, and A. Chack. (2015). The effect of customer trust on customer loyalty and repur- chase intention: The moderating influence of perceived CSR. International Journal of Research in IT, Management and Engineering 5 (4):1–31.
