Unstructured Data Analysis and Modelling

Author: Quang Phong
Year: 2022

🧐 What?

"Unstructured Data Analysis and Modelling" is an series of subprojects in which the author attempts to process, analyze and build Machine Learning models for data that are not structured and tabular. The data are in text, image, and audio formats.

🤷 Why?

Business digitization has increased the amount of data available within organizations. Firms and managers are now tasked with extracting insights from these new and expanding data sources. Challenges arise when we consider the nature of these new digital forms of data. One issue is that data is becoming increasingly unstructured (e.g., text 📜, visual 📸, and audio data 🎧), necessitating different methods of analysis than traditional (structured) data forms. Aside from that, it is unclear how we can use these data in day-to-day business operations. Firms and managers can benefit from new insights only when we can process these data and link them to business-relevant outcomes.

⚒️ How?

No.	Data	Unstructured format	Preprocessing techniques	Analytical methods
1	Movie reviews ✍️	Text	Text feature mining, special character removal, stemming	Sentiment analysis, revenue predictive modelling
2	Movie reviews ✍️	Text	Topic modelling, word embedding	Revenue predictive modelling
3	Human faces 👨🏿 👩🏻	Image	Pixel representation, principal component analysis (PCA)	Average face presentation, original face recovering, face recognition
4	Clothing item images 👖	Image	Transparency removal, color extraction, line extraction, texture extraction	Brand classification, product classification
5	Human voice 🗣️	Audio	Acoustic feature extraction: frequency, standard deviation, jitter, shimmer, harmonic-to-noise ratio, pitch, loudness, timbre	-
6	Food-related pictures 🍜	Image	Color extraction, line extraction, texture extraction	Content classification
7	Vehicle sound 🚑	Audio	Acoustic feature extraction: frequency, standard deviation, jitter, shimmer, harmonic-to-noise ratio, pitch, loudness, timbre	Sound source (vehicle) recognition

🧱 Structure?

This repository contains 4 folders:

data, including data for 7 sub-projects
src, including main analysis and modelling codes for 7 sub-projects
deliverables, including reports for 7 sub-projects
media (media files)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
deliverables		deliverables
media/gif		media/gif
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unstructured Data Analysis and Modelling

🧐 What?

🤷 Why?

⚒️ How?

🧱 Structure?

About

Releases

Packages

License

quang-phong/unstructured-data-analysis-modelling

Folders and files

Latest commit

History

Repository files navigation

Unstructured Data Analysis and Modelling

🧐 What?

🤷 Why?

⚒️ How?

🧱 Structure?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages