<a href="https://colab.research.google.com/github/natnew/A-Systems-Approach/blob/main/Data_Science_BootCamp_Working_With_Data_in_Python_Week_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

One of the key skills of a data scientist is to break problems down into constituent parts and 
approach each in a logical and systematic manner. <br>
In machine learning and AI applications, the core intention is to create a machine that is able to 
replicate – or even exceed - the performance of a human when completing a certain task. One 
example which attracts significant attention is self-driving vehicles; in which a machine is able to 
master the complex and dynamic systems of roads and traffic, while benefitting from advantages 
such as almost instantaneous reaction time and a lack of fatigue.

#A systems approach

i) Think of some common actions a self-driving vehicle must perform, e.g., pulling away from a 
traffic light, turning left, staying in lane etc. For at least three of these actions, produce some 
pseudo-code that describes the components of each action. For example, for approaching a 
traffic light:

>IF traffic light is green<br>
>THEN maintain speed<br>
>ELSE reduce speed to stationary<br>
>END IF<br>

>IF vehicle ahead is moving<br>
>THEN continue to move forward<br>
>Else reduce speed to stationary<br>
>END IF<br>

>IF there isn't a person crossing the road<br>
>THEN indicate left and slowly trun into road<br>
>Else reduce speed to stationary<br>
>END IF<br>

>IF pedestrian is standing at the crossing<br>
>THEN reduce speed to stationary<br>
>Else continue to move forward<br>
>END IF<br>

This sort of fixed input-fixed output works well enough for certain set scenarios, and these basic 
instructions are the most straightforward part to get right. What this does not provide is the ability 
for the system to react to dynamic environments and unstructured problems. Enabling the system to 
think “for itself” through machine learning is considerably more difficult

The general format for this system is: (Diagram 1) Please follow the Link: [Diagram - Systems thinking](https://github.com/natnew/A-Systems-Approach/blob/main/Automated%20driving%20system.JPG)

Think about the practicalities of building a self-driving vehicle system. What inputs might you 
want to gather; what training data would you use and how would you collect it; what outputs 
would you have to consider the possibility of (in addition to those shown in the figure above)?

Training data:<br>
> Actions to follow for road signs<br>
> Actions to follow for pedestrians and hazzrads<br>
> Actions to take during different weather conditions<br>
> Actions to take in the event of an accident <br>
> Actions to take when interacting with police officials/traffic wardens <br>


# Data Exploration and Descriptive Analytics 

PaperBoss Ltd are a company that sell office supplies to corporate clients within the UK. The dataset 
below presents some recent data on the sales reps employed by the company. The management 
team of the company have some basic questions that they would like answered:


- What is the mean client spend over the three month period?
- Are there any offices that are underperforming, compared to the others?
- Does sales rep experience have any impact on sales (measured in total takings)?

However, they would also like you, as a data practitioner, so provide some additional insight into 
their organisations. Using whichever methods of analysis and visualisation you wish, explore the 
dataset below. Describe and summarise the data in as many ways as you can.

In [5]:
#load libraries and data
import pandas as pd


In [8]:
data1 = pd.read_csv('data1.csv', encoding= 'unicode_escape')

In [9]:
data1.head()

Unnamed: 0,Country,Office,Rep Employee ID,Salary (£k),Starting date,No. clients,June Sales (£k),July Sales (£k),August Sales (£k)
0,England,Bath,131,43,2011,12,48,69,58
1,England,Bath,132,44,2018,12,48,35,59
2,England,Bath,133,36,2011,14,42,98,53
3,England,Bath,134,46,2010,15,45,68,70
4,England,London,111,37,2015,13,39,66,44


In [11]:
data1.shape

(40, 9)

The data has 40 rows and 9 columns

In [13]:
data1.columns

Index(['Country ', 'Office', 'Rep Employee ID', 'Salary (£k)', 'Starting date',
       'No. clients', 'June Sales (£k)', 'July Sales (£k)',
       'August Sales (£k)'],
      dtype='object')

In [12]:
data1.describe

<bound method NDFrame.describe of       Country       Office  ...  July Sales (£k)  August Sales (£k)
0      England        Bath  ...               69                 58
1      England        Bath  ...               35                 59
2      England        Bath  ...               98                 53
3      England        Bath  ...               68                 70
4      England      London  ...               66                 44
5      England      London  ...               57                 94
6      England      London  ...              118                 90
7      England      London  ...               45                 73
8      England      London  ...              116                 84
9      England      London  ...              130                 71
10     England      London  ...               41                106
11     England      London  ...               99                 56
12     England  Manchester  ...               59                 57
13     England

In [14]:
data1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Country            40 non-null     object
 1   Office             40 non-null     object
 2   Rep Employee ID    40 non-null     int64 
 3   Salary (£k)        40 non-null     int64 
 4   Starting date      40 non-null     int64 
 5   No. clients        40 non-null     int64 
 6   June Sales (£k)    40 non-null     int64 
 7   July Sales (£k)    40 non-null     int64 
 8   August Sales (£k)  40 non-null     int64 
dtypes: int64(7), object(2)
memory usage: 2.9+ KB


In [15]:
from matplotlib import pyplot as plt

Investigate what is a BIM file format? Is there any preferred neutral file format capable of 
exchanging data that is being promoted by the International Standards Organization (ISO) or any
other standards organisation? Write down your findings and consider the affects this lack of 
interoperability limits data exchange between different software.


> IFC AND COBie are the preferred file formats for data exchange in the construction industry.<br>
> Most construction software solutions now have the ability to export data in these file formats. However, those solutions that dont have interoperable capabilities,  risk a very convoluted workflow and the risk of data inaccuracy and loss. <br>

Research, list and produce a narrative of the typical issues, often raised as concerns, by practitioners 
and companies associated with the lack of, or use of BIM data. For example, quantity, inconsistency, 
accuracy, format, etc. Buildingsmart, IFC, COBie, BCF, etc.

> Accuracy of BIM models (graphical and non-graphical)<br>
> Quality of design and data <br>
> Transparency and accountability <br>
> Value <br>
> Miscommunication <br>
> Duplicated work <br>
> Silod data <br>

Read and review the Autodesk report: Harnessing the Data Advantage in Construction (2021).