Skip to content

Data Quality Assessment is a repository that contains the source code developed for the article named "Development and evaluation of Artificial Intelligence techniques for IoT data quality assessment and curation" published in the Internet of Things journal.

License

Notifications You must be signed in to change notification settings

lauramartingonzalezzz/DQAssessment

Repository files navigation

Data Quality Assessment

Data Quality Assessment is a repository that contains the source code developed in the article [1], as well as the dataset used.

This repository comprises 9 files and 3 folders:

  • AI-enabled_data_quality_improvement_techniques.ipynb is a Jupyter notebook presenting the improvement workflow of a raw dataset using domain knowledge and AI techniques.
  • DQ_dimensions_performance.py is a Python script in charge of launching Monte Carlo simulations to obtain runtime performance when assessing the chosen quality dimensions.
  • main.m is the Matlab script that is responsible for obtaining the performance metrics of the output results of the previous file.
  • DQ_dimensions_performance.ipynb is a Jupyter notebook that depicts the performance of each quality dimension in terms of delay.
  • configuration_variables.py is a Python script used as a configuration file.
  • basic_operations.py is a Python script that comprises different base functions.
  • context_broker_api.py is a Python script that defines the API requests needed to interact with the Context Broker.
  • requirements.txt is the standard file listing the PyPI packages to be installed.
  • README.md is this documentation.
  • /raw_data contains the dataset used prior to its assessment. (data can be downloaded here)
  • /data is an empty folder tree which will be filled in as the AI-enabled_data_quality_improvement_techniques.ipynb script is run.
  • /simulations is an empty folder tree which will contain a subfolder for each of the Monte Carlo simulations obtaining the delay in the quality dimensions processes (DQ_dimensions_performance.py). It also includes an empty subfolder named /median_values, where some of the results of the Matlab script (main.m) will be stored for later use in the Jupyter notebook DQ_dimensions_performance.ipynb.

[1] L. Martín, L. Sánchez, J. Lanza, and P. Sotres, “Development and evaluation of Artificial Intelligence techniques for IoT data quality assessment and curation,” Internet of Things, vol. 22, p. 100779, Jul. 2023, doi: 10.1016/J.IOT.2023.100779.]

Acknowledgements

This work was supported by the European Commission CEF Programme by means of the project SALTED ‘‘Situation-Aware Linked heTerogeneous Enriched Data’’ under the Action Number 2020-EU-IA-0274 and by the Spanish State Research Agency (AEI) by means of the project SITED ‘‘Semantically-enabled Interoperable Trustworthy Enriched Data-spaces’’ under Grant Agreement No. PID2021-125725OB-I00.

License

This material is licensed under the GNU Lesser General Public License v3.0 whose full text may be found at: LICENSE

About

Data Quality Assessment is a repository that contains the source code developed for the article named "Development and evaluation of Artificial Intelligence techniques for IoT data quality assessment and curation" published in the Internet of Things journal.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published