Thanks again for attending our session at the Gamma Iota Sigma Regional Conference, or the IABA Annual Meeting!
I'm an independent contractor helping companies build custom cloud apps and leverage data science, visual analytics, and AI. I offer low introductory rates, free consultation and estimates, and no minimums, so contact me today and let's chat about how I can help!
https://www.bryce-chamberlain.com/
Here are the resources from the talk:
- Data Visualization in Business Communication.pdf: Presentation slides.
- insurance.csv: Data used in the slides. From https://www.kaggle.com/datasets/simranjain17/insurance.
- stepwise.pbix: Power BI document shown in the slides.
- stepwise.pdf: PDF built with Adobe Illustrator
- dummy-data/: Files for the example of running Chat GPT to explain a dataset with known properties, for testing purposes.
- gpt-convos/: Interesting conversations with Chat GPT. For now, there is just one that covers finding a story in data and creating a plot. We may add more later as we come across them. Download and open these files in a web browser.
Here is some software you might be interested in:
Data Sources
- https://www.kaggle.com/datasets - data sets linked to code and analysis.
- https://paperswithcode.com/datasets
- https://datasetsearch.research.google.com/
- https://archive.ics.uci.edu/datasets
- https://www.reddit.com/r/datasets/
- https://aws.amazon.com/marketplace/
- https://github.com/fivethirtyeight/data
- https://podcasts.apple.com/us/podcast/the-alternative-data-podcast/id1539909575
- https://www.battlefin.com/events/miami-2024
- College/university library may have data subscriptions available.
- Government (national):
- https://data.gov/
- https://www.census.gov/data.html
- https://www.census.gov/programs-surveys/susb.html counts of businesses and employees by size, location, industry
- https://github.com/superchordate/ASEC-census-helper download over 500 person and household-based survey responses over the last 5 years.
- Government (local): cities will often have data portals, too! For example: https://data.cityofchicago.org/
- Publicly traded companies:
- https://www.sec.gov/edgar/search-and-access for lists of companies, SEC financial disclosures
- https://www.simfin.com/en/fundamental-data-download/ for clean company datasets
- Geographic Data:
- Map zip codes to counties and longitude/latitude: https://download.geonames.org/export/dump/ > US.zip
AutoML
- https://chat.openai.com/ (Chat GPT, will need Code Interpreter which is a paid feature)
- https://github.com/superchordate/storyteller
- https://rapidminer.com
- https://www.datarobot.com
- https://aws.amazon.com/sagemaker
- https://developer.apple.com/machine-learning/create-ml
- https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html
- https://github.com/ydataai/pandas-profiling (neat Python tool for Exploratory Data Analysis, not complete AutoML)
Business Intelligence
- https://chat.openai.com/ (Chat GPT, will need Code Interpreter which is a paid feature)
- https://powerbi.microsoft.com/en-us
- https://www.tableau.com
Design
- https://www.adobe.com/products/illustrator/free-trial-download.html
- https://inkscape.org
- https://imagej.net/software/fiji
R Packages
I recommend exploring and visualizing data in Power BI, but if you need to modify/preprocess data then R is a good solution. Keep in mind PowerBI includes PowerQuery which is pretty good for preprocessing.
Here are some packages that I use a lot:
- easyr: This package makes things that were historically difficult in R easier. In particular,
read.any
helps reading files (it reads most data formats automatically),todate/tonum
flexibly convert characters to dates or numbers and cover more edge cases than other similar functions, andjrepl
which joins and replaces data from related datasets and turns a 2-step operation into one while checking to confirm data isn't duplicated in the join. See docs on GitHub for more useful functions. - dplyr: The reason R is better for data manipulation is this package. It makes working with data very intuitive and easy.
- fakeR: Use to create dummy datasets you can send to Chat GPT Code Interpreter to generate code samples.
If you do use code, make sure to check out the Git Guide at https://github.com/casact/meta/blob/master/git-guide/git-guide.md.