# Data Collection

There are two aspects of data collection that are the most important:

- Collecting the *right* information
- Collecting *correct* information

To illustrate these concepts, let's use an example: what used to be called balancing the checkbook and now that few people use checkbooks is probably more appropriately called reconsiling the finances.

Regardless of what you call it, the general premise is to record how much money you spend and compare that against what your bank account says you've been charged.

## Getting the Right Information

Getting the right information is harder than it sounds. Without the right data you can't answer questions, analyze trends, make predictions, or recommend future action. In my experience, there are two major common mistakes: collecting the wrong data and getting too much or too little data.

Using the balance-the-finances example, in order to compare what I spent with what my bank account reflects, I need to record actual expenditures. Here are some examples of how I could go wrong when collecting data to balance my finances:

#### Wrong data

- Recording the base price of every item in the store. This won't do me any good because I have no indication of which items I actually purchased
- Recording the base price of the items I purchase when they're on sale. This also won't do me any good because I didn't pay the base price, I paid the sale price. Also, if I just record the base price, I haven't taken into account the taxes I paid on the item

#### Too much or too little information

- Recording the base price, sale price, and tax amount will (eventually) get me the information I need, but there is a lot of info there that I don't need. This adds a layer of complexity before I can get my answer
- Recording purchases I make on the weekends. This will allow me to account from only a small portion of my expenses and will probably exclude all deposits since paychecks are usually deposited during business hours on weekdays

The right information to collect in this particular instance is the amount I paid or deposited no matter the payment method or time/date of the interaction.

## Population Sampling

In an era when all digital interactions have the potential to be recorded, it's easy to fall into the trap of thinking all information can be obtained about the area of interest. However, it's important to keep in mind whether you're actually seeing *all* data of interest.

For example, let's say I only record the purchases I make with my credit card. If I make all my purchases with a credit card then this is fine. But if I use check, cash, money order or bitcoin, even if it's only rarely, then I'll be missing data and my finances won't balance.

Perhaps a more subtle distinction would be what I balance my expenses and deposits against. If I only look at the charges and deposits in my checking account, but have set up a single payor to send funds directly to my savings account, my finances also won't balance.

We can see the same concepts in large real world data collection. Political polling is a great example. Any poll that occurs soley online only polls a narrow subset of the population: the people that go online, that frequent whatever site the poll was posted to, and that are inclined to answer polling questions. The same is true of polls that call citizens; this subset includes people that: have a landline or haven't put their cell number on the do-not-call list, are willing to answer a call from an unknown number, and are inclined to answer polling questions.

Retail and customer service industries encounter the same challenges. Are all interactions recorded digitally, or do some interactions happen in person or over the phone which are harder to capture? Are you seeing all your customers, or only the ones having problems or that are satisfied?

When setting up data collection, it's important to be aware of any limitations, considerations, or caveats to the population sampled and datat that's available for collection.

## Data Quality

### Reliability

### Validity

## Data Type

## Quantative: Numbers

## Qualitative: categorizations

## Sources

In today's world, where the price of data storage continues to decline and our ability to track and record actions and information continues to increase, data can come from many sources.

[Scribber](https://www.scribbr.com/) has a nice table of [data collection methods](https://www.scribbr.com/methodology/data-collection/#step-2-choose-your-data-collection-method). Some of the more common business data sources include:

- Observation: measuring events, actions, etc without attempting to affect them
  - Examples:
    - Internet of Things (IOT): just about any device that records and sends digital information falls into this category. It can include cell phones, smart TVs, internet browsers, smart appliances, and many more
    - Remote sensors: as technology spreads into diverse sectors, digital sensors and meters that send information back for collection and use are becoming more prevalent. This can be anythingn from the electricity and water use at a residence to meters to a nuclear power plant to satellites recording daily temperature by geolocation
    - Busineses often record information about their interactions with their customers, vendors, suppliers, etc. All of the information that is collected about events that occur during regular business practices are observations
- Survey: predetermined questions that are asked in either verbally (in person, on the phone, video conference, etc) or written (email, letter, ect). Caution: survey questions must be crafted carefully because the way they're worded can influence the answer respondents provide, thus biasing the results
  - Examples:
    - Customer service is often gaged using customer satisfaction survyes. These can be done via interactive phone surveys (customer pushes a number key to indicate their answer), by a post-card shaped written survey, a follow up email, or the representative that provided the survice to name a few
    - The arboretum society keeps sending me a mailed survey asking about information on the trees in my area
    - Medical research frequently uses surveys to get background information on patients and to gage patient satisfaction with the service provided
- Interview/focus group: Open ended questions asked by an interviewer in either a one-one or group setting
  - Examples:
    - Market research uses focus groups extensively to test how a product will be received by consuers
    - The entertainment industry uses focus groups to test audience reaction to movies, sometimes showing different groups alternate endings, beginnings, etc to see what works best
- Archival research: historical information such as manuscripts, documents, records, repositories, photographs, or other historic records
  - Example:
    - Climate research often uses archival data. This can include such data as satellite data over the last 20 years, pictures of glaciers from 50 years ago, and written descriptions/records of major events such as floods or earthquakes
- Secondary data: data that has already been collected, usually collected by sources such as governments or research organizations
  - Example:
    - Businesses frequently supplement the information they gather with secondary sources. Such secondary data could be thrid party records of the physical location of an IP address or the number of people in a household

# Resources

- [Wikipedia - Data collection](https://en.wikipedia.org/wiki/Data_collection)
- [Resoponsible Conduct in Data Management - Data Collection](https://ori.hhs.gov/education/products/n_illinois_u/datamanagement/dctopic.html)
- [Data Collection | A Step-by-Step Guide with Methods and Examples](https://www.scribbr.com/methodology/data-collection/)
- [data collection](https://www.techtarget.com/searchcio/definition/data-collection)
- [Data Collection: How to Get Started](https://www.dimagi.com/data-collection/)
- [What Is Data Collection: Methods, Types, Tools, and Techniques](https://www.simplilearn.com/what-is-data-collection-article)
- [Data Collection: Best Methods + Practical Examples](https://www.iteratorshq.com/blog/data-collection-best-methods-practical-examples/)
- [What are the Methods of Data Collection?](https://www.lotame.com/what-are-the-methods-of-data-collection/)