Skip to content

quang-phong/unstructured-data-analysis-modelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unstructured Data Analysis and Modelling

Linkedin Badge Github Badge Email Badge

Author: Quang Phong
Year: 2022

🧐 What?

"Unstructured Data Analysis and Modelling" is an series of subprojects in which the author attempts to process, analyze and build Machine Learning models for data that are not structured and tabular. The data are in text, image, and audio formats.

🤷 Why?

Business digitization has increased the amount of data available within organizations. Firms and managers are now tasked with extracting insights from these new and expanding data sources. Challenges arise when we consider the nature of these new digital forms of data. One issue is that data is becoming increasingly unstructured (e.g., text 📜, visual 📸, and audio data 🎧), necessitating different methods of analysis than traditional (structured) data forms. Aside from that, it is unclear how we can use these data in day-to-day business operations. Firms and managers can benefit from new insights only when we can process these data and link them to business-relevant outcomes.

⚒️ How?

No. Data Unstructured format Preprocessing techniques Analytical methods
1 Movie reviews ✍️ Text Text feature mining, special character removal, stemming Sentiment analysis, revenue predictive modelling
2 Movie reviews ✍️ Text Topic modelling, word embedding Revenue predictive modelling
3 Human faces 👨🏿 👩🏻 Image Pixel representation, principal component analysis (PCA) Average face presentation, original face recovering, face recognition
4 Clothing item images 👖 Image Transparency removal, color extraction, line extraction, texture extraction Brand classification, product classification
5 Human voice 🗣️ Audio Acoustic feature extraction: frequency, standard deviation, jitter, shimmer, harmonic-to-noise ratio, pitch, loudness, timbre -
6 Food-related pictures 🍜 Image Color extraction, line extraction, texture extraction Content classification
7 Vehicle sound 🚑 Audio Acoustic feature extraction: frequency, standard deviation, jitter, shimmer, harmonic-to-noise ratio, pitch, loudness, timbre Sound source (vehicle) recognition

🧱 Structure?

This repository contains 4 folders:

  • data, including data for 7 sub-projects
  • src, including main analysis and modelling codes for 7 sub-projects
  • deliverables, including reports for 7 sub-projects
  • media (media files)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published