# Explanatory Data Analysis (EDA)

In this notebook, I am going to analyze the input data `room.txt` and guide you through my thinking process to find the solution to this problem.

In [13]:
 from collections import Counter

## First approch that comes to my mind

1. Split the big string into multiple strings: one for each room
2. For each string, extract the name of the corresponding room
3. For each string, count the number of each chair
4. Gather all the results to calculate the "total"

The first step seems to be the most difficult.

## Read and transform the input data


From the `task_en.txt` file, we can already extract information about the chairs:

```
The different types of chairs are as follows:
W: wooden chair
P: plastic chair
S: sofa chair
C: china chair
```

As we only need the capital letters in the following, we are going to save them in a `set` called `chairs`

In [25]:
chairs = {'W', 'P', 'S', 'C'}

### Import the data as a string

In [32]:
with open('rooms.txt', 'r') as f:
    rooms_string = f.read()
Counter(rooms_string)

Counter({'+': 24,
         '-': 240,
         '\n': 49,
         '|': 124,
         ' ': 2005,
         '(': 8,
         'c': 4,
         'l': 5,
         'o': 10,
         's': 2,
         'e': 6,
         't': 5,
         ')': 8,
         'P': 7,
         'S': 3,
         'p': 1,
         'i': 6,
         'n': 4,
         'g': 2,
         'r': 3,
         'm': 3,
         'W': 14,
         'f': 2,
         'C': 1,
         'b': 2,
         'a': 2,
         'h': 2,
         'k': 1,
         '/': 4,
         'v': 1,
         'y': 1})

From the function counter, we can already get a whole lot of information about the data:
- The `'-'` and `'|'` are the strings that delimit the areas of the rooms. The room being longer than large: <br>```number_of('-') < number_of('|')```

- The number of `'+'` gives the number of cornxzers in the apartment

- The number of `'('` or `')'` gives the number of rooms

- The number of `'\n'` + 1 gives the length of the apartment

- **Most importantly:** gathering the capital letters keys and their counts already gives the relevant information for the first output line of the problem: **total**. As we need to find the chair repartition in each room as well, the problem is not yet solved, but we can definitively:
    1. Save the **total** result as a future check when we'll have gathered information for each room
    2. Probably use t`dict`: `total`er` function later o*total* outputng the Python built-in object `dict` to save the `total` output

In [33]:
total = {chair: Counter(rooms_string)[chair] for chair in Counter(rooms_string) if chair in chairs}
total

{'P': 7, 'S': 3, 'W': 14, 'C': 1}

As the input  represents a 2D-plan, I think the best way to locate things in the plan is to transform it into a 2D-array.

In [34]:
rooms = [[j for j in i.split('\n')] for i in rooms_string.splitlines()]
rooms

[['+-----------+------------------------------------+'],
 ['|           |                                    |'],
 ['| (closet)  |                                    |'],
 ['|         P |                            S       |'],
 ['|         P |         (sleeping room)            |'],
 ['|         P |                                    |'],
 ['|           |                                    |'],
 ['+-----------+    W                               |'],
 ['|           |                                    |'],
 ['|        W  |                                    |'],
 ['|           |                                    |'],
 ['|           +--------------+---------------------+'],
 ['|                          |                     |'],
 ['|                          |                W W  |'],
 ['|                          |    (office)         |'],
 ['|                          |                     |'],
 ['+--------------+           |                     |'],
 ['|              |           |