Skip to content

iAmKankan/Data-Gathering-And-Preprocessing

Repository files navigation

Index

Dark

Data Preprocessing

Common Terms

Variables

Dark variables2

Dark

  • Datasets usually contain large volumes of data that may be stored in formats that are not easy to use.
  • That’s why data scientists need first to make sure that data is correctly formatted and conforms to the set of rules.
  • Data sparseness and formatting inconsistencies are the biggest challenges – and that’s what data cleansing is all about.

Data cleaning is a task that identifies incorrect, incomplete, inaccurate, or irrelevant data, fixes the problems, and makes sure that all such issues will be fixed automatically in the future.

  • According to Appen, data scientists spend 60% of the time organizing and cleansing data!

Dark