# Requirements
Basic knowledge of Python is needed to follow this notebook. Check the subjects listed in these courses:
- [Python course](https://www.kaggle.com/learn/python)
- [Pandas course](https://www.kaggle.com/learn/pandas)
- Check out [this notebook](https://www.kaggle.com/ponybiam/introduction-to-ifcopenshell-functions) to get familiar with the package `ifcopenshell`.

# About the environment
This Python 3 environment comes with many helpful analytics libraries installed. It is defined by the [kaggle/python Docker image](https://github.com/kaggle/docker-python). Input data files are available in the read-only `../input/` directory. For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory:

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

You can write up to 20GB to the current directory (`/kaggle/working/`) that gets preserved as output when you create a version using "Save & Run All". You can also write temporary files to `/kaggle/temp/`, but they won't be saved outside of the current session.

# Load packages
First we are going to install the [`ifcopenshell`](http://ifcopenshell.org/) package. *IfcOpenShell* is an open source software library that helps users and software developers to work with the IFC file format. The IFC file format can be used to describe building and construction data. The format is commonly used for Building Information Modelling (BIM).<br>
<br>
Run the following code to install the package in the curren environment:

In [None]:
conda install -c conda-forge -c oce -c dlr-sc -c ifcopenshell ifcopenshell

And now we import the packages we are going to use in this notebook:

In [None]:
import pandas as pd
import ifcopenshell

# Load dataset
We will be using the files:

- `Grethes-hus-bok-2.ifc`
- `11134_V_Motebello_Heistopp_Rev.ifc`
- `11134_D_Motebello_Heistopp_Rev.ifc`

 Let's use our recently installed package to open them:

In [None]:
file1 = ifcopenshell.open("../input/example-ifc-file/Grethes-hus-bok-2.ifc")
file2 =  ifcopenshell.open("../input/example-ifc-file/11134_V_Motebello_Heistopp_Rev.ifc")

# Parse dataset
## File 1: *Grethes-hus-bok-2.ifc*
First, we are going to get all the elements of type [IfcBuildingElement](https://standards.buildingsmart.org/IFC/RELEASE/IFC4/ADD2_TC1/HTML/schema/ifcproductextension/lexical/ifcbuildingelement.htm) from `Grethes-hus-bok-2.ifc` file. We will use the method [by_type](https://blenderbim.org/docs/ifcopenshell-python/api-documentation.html#ifcopenshell.file.file.by_type) (from `ifcopenshell` package) to get a list with all the `IfcBuildingElement` entities:

In [None]:
elements = file1.by_type('IfcBuildingElement')
len(elements)

We have 89 elements in our list. Let's see which information we have about the first of them. For this purporse we are using the method [get_info](https://blenderbim.org/docs/ifcopenshell-python/api-documentation.html#ifcopenshell.entity_instance.entity_instance.get_info) with the parameter `recursive=True`; this will parse the entity information as a dictionary and any IFC entity found inside will be parsed as a dictionnary too:

In [None]:
# We are selecting only the first element of our list "elements": elements[0]
element_info = elements[0].get_info(recursive=True)
element_info

Ok, we have a lot of information here! Let's choose some of them to build a dataset. We are getting the element id, the global id, the name and the description. Remember, this is a Python dictionary, you can acces any of the dictionary's key with:

```
my_variable = my_dictionary["the_key_you_want_to_access"]
```

In [None]:
# We create the variables
element_type = element_info["type"]
element_id = element_info["id"]
global_id = element_info["GlobalId"]
name = element_info["Name"]
description = element_info["Description"]

# And we print them
print(f"This IfcBuildingElement is an {element_type} with the id {element_id}, the global id {global_id} and it's called {name}. Maybe we have a description? {description}")

We don't have a description :( but that's not a problem, missing data is something we will encounter several times and we'll learn how to deal with it later.

These features we got are really easy to get, but what happens if we want a feature that is a little bit deeper in our dictionary? Let's try to find out the organization id. We have to check out the `OwnerHistory`:

In [None]:
element_info["OwnerHistory"]

Did you find it? We have to follow this path:

```
OwnerHistory > OwningUser > TheOrganization > id
```
Let's do it:

In [None]:
# Get the organization id
organization_id = element_info["OwnerHistory"]["OwningUser"]["TheOrganization"]["id"]
# print it
print(f"The organization id is {organization_id}")

Now that we have all the features we wanted, we can create a Pandas DataFrame with them. One way to do it is from a dictionary, we should create one with the information we have:

In [None]:
# We create a dictionary
element_info_dictionary = {"element_id": element_id,
                            "global_id": global_id,
                            "element_type": element_type,
                            "name": name,
                            "organization_id": organization_id,
                            "description": description}

And now is super easy to create a pandas DataFrame:

In [None]:
pd.DataFrame(element_info_dictionary, index=[0])

Great! we have one row of our dataset! What if we want to add each of the 89 elements we obtained? We have to use a **for loop** that loops over our `elements` list and obtains each of the features we want. Sounds fancy, but is an easy task and a really useful tool. Make sure you read all the comment in the following code, it's explained step by step:

In [None]:
# Get all items of type "IfcBuildingElement"
elements = file1.by_type('IfcBuildingElement')

# Create an empty list to append each element
elements_list = []

# Loop over each of the elements in our list
for element in elements:
    # get element info (this is a dictionary)
    element_info = element.get_info(recursive=True)
    
    # Create desired variables, obtaining the value from the dictionary
    element_id = element_info["id"]
    global_id = element_info["GlobalId"]
    element_type = element_info["type"]
    name = element_info["Name"]
    organization_id = element_info["OwnerHistory"]["OwningUser"]["TheOrganization"]["id"]
    description = element_info["Description"]
    
    # Create dataframe (you can assign the index you want, we are going to ignore it later)
    df = pd.DataFrame({"element_id": element_id,
                        "global_id": global_id,
                        "element_type": element_type,
                        "name": name,
                        "organization_id": organization_id,
                        "description": description}, index=[0])
    
    # Append to the list created at the beginning of this code
    elements_list.append(df)

And now is time to put it all togheter to create our dataset. We are going to use the Pandas method `concat`, you have to pass a list of dataframes as parameter (these are the dataframes you want to concat) and we are using `ignore_index=True` to reindex the final dataframe:

In [None]:
data1 = pd.concat(elements_list, ignore_index=True)
data1.head()

Now we have a dataset with all the `IfcBuildingElement`s present in file1! Would be nice to add some information about the building. Let's explore another ifc entity: 

In [None]:
buildings = file1.by_type("IfcBuilding")
buildings

We have only one building described in this ifc file. Let's get the dictionary with all the information:

In [None]:
building_info = buildings[0].get_info(recursive=True)
building_info

Let's add to our dataset the building name and id:

In [None]:
# First we get the variables we want
bdg_name = building_info["Name"]
bdg_id = building_info["id"]

# And then we add them to our datatset
data1["bdg_name"] = bdg_name
data1["bdg_id"] = bdg_id

data1.head()

## File 2: *11134_V_Motebello_Heistopp_Rev.ifc*
And now we want to perform the same process for file 2. Here we could copy and paste our previous code, but in cases like this when you want to perform the same process to different data, is really useful to write a function. We are going to create a function with the exact same code we used in the previous section all together in one function.

In [None]:
def create_IcfBuildingElement_dataframe(input_file):

    elements = input_file.by_type('IfcBuildingElement')
    building = input_file.by_type('IfcBuilding')[0] # here we are getting the first and only element of this list
    
    elements_list = []

    for element in elements:
        element_info = element.get_info(recursive=True)
        building_info = building.get_info(recursive=True)

        element_id = element_info["id"]
        global_id = element_info["GlobalId"]
        element_type = element_info["type"]
        name = element_info["Name"]
        organization_id = element_info["OwnerHistory"]["OwningUser"]["TheOrganization"]["id"]
        description = element_info["Description"]
        
        bdg_id = building_info["id"]
        bdg_name = building_info["Name"]

        df = pd.DataFrame({"element_id": element_id,
                            "global_id": global_id,
                            "element_type": element_type,
                            "name": name,
                            "organization_id": organization_id,
                            "description": description,
                            "bdg_id": bdg_id,
                            "bdg_name": bdg_name}, index=[0])

        elements_list.append(df)
    
    data = pd.concat(elements_list, ignore_index=True)
    
    return data

In [None]:
data2 = create_IcfBuildingElement_dataframe(file2)
data2.head()

# Export dataset
Last step is to concat the two datasets we recently created and export it.

In [None]:
ifc_parsed_data = pd.concat([data1, data2], ignore_index=True)
ifc_parsed_data

And we export it with the pandas methos `to_csv`. We use the parameter `index=False` to omit the index column when saving. This will save the file in the directory `/kaggle/working/`.

In [None]:
ifc_parsed_data.to_csv("ifc_parsed_data.csv", index=False)