# Explanatory Data Analysis (EDA)

In this notebook, I am going to analyze the input data `room.txt` and guide you through my thinking process to find the solution to this problem.

In [1]:
 from collections import Counter, defaultdict
 import re

## First approch that comes to my mind

1. Split the big string into multiple strings: one for each room
2. For each string, extract the name of the corresponding room
3. For each string, count the number of each chair
4. Gather all the results to calculate the "total"

The first step seems to be the most difficult.

## Read and transform the input data


From the `task_en.txt` file, we can already extract information about the chairs:

```
The different types of chairs are as follows:
W: wooden chair
P: plastic chair
S: sofa chair
C: china chair
```

As we only need the capital letters in the following, we are going to save them in a `set` called `chairs`

In [2]:
chairs = {'W', 'P', 'S', 'C'}

### Import the data as a string

In [3]:
with open('rooms.txt', 'r') as f:
    rooms_string = f.read()
Counter(rooms_string)

Counter({'+': 24,
         '-': 240,
         '\n': 49,
         '|': 124,
         ' ': 2005,
         '(': 8,
         'c': 4,
         'l': 5,
         'o': 10,
         's': 2,
         'e': 6,
         't': 5,
         ')': 8,
         'P': 7,
         'S': 3,
         'p': 1,
         'i': 6,
         'n': 4,
         'g': 2,
         'r': 3,
         'm': 3,
         'W': 14,
         'f': 2,
         'C': 1,
         'b': 2,
         'a': 2,
         'h': 2,
         'k': 1,
         '/': 4,
         'v': 1,
         'y': 1})

From the function counter, we can already get a whole lot of information about the data:
- The `'-'`, `'|'` and `'/'` are the strings that delimit the areas of the rooms. The room being longer than large: <br>```number_of('-') < number_of('|')```

- The number of `'+'` gives the number of cornxzers in the apartment

- The number of `'('` or `')'` gives the number of rooms

- The number of `'\n'` + 1 gives the length of the apartment

- **Most importantly:** gathering the capital letters keys and their counts already gives the relevant information for the first output line of the problem: **total**. As we need to find the chair repartition in each room as well, the problem is not yet solved, but we can definitively save the **total** result as a future check when the information of each room is gathered.

### Observation

According tho the `task_en.txt`, mistakes occured in the past when: *manually counting the various types of chairs*. This refers to the total number of chairs. 

I imagine a mistake being for example: forgetting a china chair. 

Considering that:
- gathering only the toltal number of each type of chairs can be quickly implemented
- this implementation will solve most of the client's problems
- solving the problem of finding the repartition of the chairs in each room looks much more complex

If I were given this problem in the context of my job, I would explain the client, the technical implications of the result that he would like to get. And I would propose him to experiment the quick-and-easy solution:
- Use my program to know the total hetoltal number of each type of chairs

In [4]:
n_rooms = Counter(rooms_string)['(']
print(n_rooms)

total = {chair: Counter(rooms_string)[chair] for chair in Counter(rooms_string) if chair in chairs}
total

8


{'P': 7, 'S': 3, 'W': 14, 'C': 1}

As the input  represents a 2D-plan, I think the best way to locate things in the plan is to transform it into a 2D-array.

In [5]:
rooms = [list([j for j in i.split('\n')][0]) for i in rooms_string.splitlines()]

def print_of(rooms):
    return list(map(lambda L: ''.join(L), rooms))

print_of(rooms)

['+-----------+------------------------------------+',
 '|           |                                    |',
 '| (closet)  |                                    |',
 '|         P |                            S       |',
 '|         P |         (sleeping room)            |',
 '|         P |                                    |',
 '|           |                                    |',
 '+-----------+    W                               |',
 '|           |                                    |',
 '|        W  |                                    |',
 '|           |                                    |',
 '|           +--------------+---------------------+',
 '|                          |                     |',
 '|                          |                W W  |',
 '|                          |    (office)         |',
 '|                          |                     |',
 '+--------------+           |                     |',
 '|              |           |                     |',
 '| (toile

In [6]:
sep_chars = ['\\', '|', '/', '+', '-']
sep_chars

['\\', '|', '/', '+', '-']

In [7]:
print(sep_chars[0])

\


## Move letters approach

In [8]:
rooms = [list([j for j in i.split('\n')][0]) for i in rooms_string.splitlines()]

def print_of(rooms):
    return list(map(lambda L: ''.join(L), rooms))

print_of(rooms)

['+-----------+------------------------------------+',
 '|           |                                    |',
 '| (closet)  |                                    |',
 '|         P |                            S       |',
 '|         P |         (sleeping room)            |',
 '|         P |                                    |',
 '|           |                                    |',
 '+-----------+    W                               |',
 '|           |                                    |',
 '|        W  |                                    |',
 '|           |                                    |',
 '|           +--------------+---------------------+',
 '|                          |                     |',
 '|                          |                W W  |',
 '|                          |    (office)         |',
 '|                          |                     |',
 '+--------------+           |                     |',
 '|              |           |                     |',
 '| (toile

In [9]:
dict_pos_chairs = {}
for i, row in enumerate(rooms):
    for j, element in enumerate(row):
        if element in chairs:
            dict_pos_chairs[(i, j)] = element
            # room = search_room(i, j)
            # rooms_chairs[room][element] += 1

print(dict_pos_chairs)
list_pos_chairs = list(dict_pos_chairs.keys())
list_pos_chairs.sort(key=lambda x: (x[0],x[1]))
list_pos_chairs

{(3, 10): 'P', (3, 41): 'S', (4, 10): 'P', (5, 10): 'P', (7, 17): 'W', (9, 9): 'W', (13, 44): 'W', (13, 46): 'W', (18, 41): 'P', (19, 4): 'C', (27, 34): 'W', (27, 38): 'W', (28, 34): 'W', (28, 38): 'W', (29, 8): 'P', (33, 38): 'W', (33, 43): 'W', (33, 47): 'W', (36, 2): 'S', (36, 38): 'W', (36, 43): 'W', (36, 47): 'W', (38, 2): 'S', (45, 46): 'P', (47, 45): 'P'}


[(3, 10),
 (3, 41),
 (4, 10),
 (5, 10),
 (7, 17),
 (9, 9),
 (13, 44),
 (13, 46),
 (18, 41),
 (19, 4),
 (27, 34),
 (27, 38),
 (28, 34),
 (28, 38),
 (29, 8),
 (33, 38),
 (33, 43),
 (33, 47),
 (36, 2),
 (36, 38),
 (36, 43),
 (36, 47),
 (38, 2),
 (45, 46),
 (47, 45)]

In [28]:
rooms = [list([j for j in i.split('\n')][0]) for i in rooms_string.splitlines()]
dict_rooms_chairs = defaultdict(list)

def is_in_room_same_row(i, j):
    chair = rooms[i][j]
    row_no_space = list(filter(lambda e: e != ' ', rooms[i]))
    return row_no_space[row_no_space.index(chair)+1] == '(' or row_no_space[row_no_space.index(chair)-1] == ')'



def find_room_on_same_row(i, j):
    chair = rooms[i][j]
    row = list(filter(lambda e: e != ' ', rooms[i]))
    j = row.index(chair)
    if row[j+1]  == '(':
        str_to_inspect = ''.join(row[j+1:])
        return re.split('\(|\)', str_to_inspect)[1]
    elif row[j-1] == ')':
        str_to_inspect = ''.join(row[:j-1])
        return re.split('\(|\)', str_to_inspect)[-1]

def change_pos(i, j, new_i, new_j):
    rooms[i][j], rooms[new_i][new_j] = rooms[new_i][new_j], rooms[i][j]
    i, j = new_i, new_j
    return i, j


     

def search_room(i_start: int, j_start: int):
    i, j = i_start, j_start
    chair = rooms[i][j]
    direction = 'up'
    while not is_in_room_same_row(i, j):
        if rooms[i-1][j] in sep_chars: # We cannot go up
            direction = 'down' # We go down
            # Back to the initial position
            i, j = change_pos(i=i, j=j, new_i=i_start, new_j=j)

        if direction == 'up':
            i, j = change_pos(i=i, j=j, new_i=i-1, new_j=j)
        elif direction == 'down':

            i, j = change_pos(i=i, j=j, new_i=i+1, new_j=j)




    room_of_chair = find_room_on_same_row(i,j)
    # Pop chair
    rooms[i][j] = ' '
    # Save it
    dict_rooms_chairs[room_of_chair].append(chair)

for k, pairs in enumerate(list_pos_chairs):
    # if k+1 in [1, 2, 3, 4, 5, 7, 8, 9, 16, 17]:
    if k+1 in [1, 2, 3, 4, 5, 7, 8, 9, 24, 25]:
    # if k in [18, 19, 20]:
        i, j = pairs
        print(i, j, dict_pos_chairs[pairs])
        search_room(i, j)



print(dict_rooms_chairs)
print_of(rooms)

3 10 P
3 41 S
4 10 P
5 10 P
7 17 W
13 44 W
13 46 W
18 41 P
45 46 P
47 45 P
defaultdict(<class 'list'>, {'closet': ['P', 'P', 'P'], 'sleepingroom': ['S', 'W'], 'office': ['W', 'W', 'P'], 'balcony': ['P', 'P']})


['+-----------+------------------------------------+',
 '|           |                                    |',
 '| (closet)  |                                    |',
 '|           |                                    |',
 '|           |         (sleeping room)            |',
 '|           |                                    |',
 '|           |                                    |',
 '+-----------+                                    |',
 '|           |                                    |',
 '|        W  |                                    |',
 '|           |                                    |',
 '|           +--------------+---------------------+',
 '|                          |                     |',
 '|                          |                     |',
 '|                          |    (office)         |',
 '|                          |                     |',
 '+--------------+           |                     |',
 '|              |           |                     |',
 '| (toile

### Remaining steps

1. Chair above or under the room name
2. Chairs on the same column
3. Long way to the room name

## Descent approach

In [None]:



chairs_current_rooms = None
for row in rooms:
    count_number_rooms = sum(row[1:-1].count(sep_char) for sep_char in sep_chars) + 1
    if not chairs_current_rooms:
        chairs_current_rooms = [{'room_name': None, 'chairs': defaultdict(int)} for _ in range(count_number_rooms)]


    print(count_number_rooms)

    info_rooms = row[1: -1].replace(' ', '').split('|')
    
    if '-' in row:
        print('CHANGE ROOM')
        print(info_rooms)


    for room_number, info in enumerate(info_rooms):
        if not info:
            pass
        elif info.startswith('('):
            chairs_current_rooms[room_number]['room_name'] =  info[1:-1]
        elif info in chairs:
            chairs_current_rooms[room_number]['chairs'][info]  += 1

    print(chairs_current_rooms)
    print('\n')
        
