Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
47 changes: 47 additions & 0 deletions python/sql/messy_data/messy_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
class MessyData():
"""
## Messy Data
Your company has an internal policy to determine your customers' credit limit, but this procedure has been questioned recently
by the board as being too conservative.
Your CEO wants to increase the current customer base credit limits in order to upsell a new line of products.
In order to do that, the company hired several external consultancies to produce new credit limit estimates.
The problem is that each agency has produced the report in its own format. Some use the format "First-name Last-name"
to identify a person, others use the format "Last-name, First-name". There is also no consensus on how to capitalize each word,
so some used all uppercase, others used all lowercase, and some used mixed-case. Internally, the data is structured as follows:

Table: customers
================
id: INT
first_name: TEXT
last_name: TEXT
credit_limit: FLOAT


Table: prospects
================
full_name: TEXT
credit_limit: FLOAT


Keep in mind that the agencies had access only to a partial customer base. There is also the possibility of more than one agency
prospecting the same customer, so it's highly likely that there will be duplicates. Finally, they've prospected customers that
were not in your customer base as well.
For this task you are interested in the prospected customers that are already in your customer base and the prospected credit limit
is higher than your internal estimate. When more than one agency prospected the same customer, chose the highest estimate.

You have to produce a report with the following fields:

first_name
last_name
old_limit [the current credit_limit]
new_limit [the highest credit_limit found]

In order to solve this exercise you may pick your preference / combination of:
- Python and/or Pandas
- SQL or equivalent DSL language
- an ETL tool

"""

def report(self):
raise ValueError("Not Implemented.")
8 changes: 8 additions & 0 deletions python/sql/messy_data/test_messy_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import unittest

from .messy_data import MessyData

class TestMessyData(unittest.TestCase):

def test_report(self):
pass