by IBM via Coursera Completed
- Course 1. Introduction to Data Engineering
- Course 2. Python for Data Science, AI & Development
- Course 3. Python Project for Data Engineering
- Course 4. Introduction to Relational Database (RDBMs)
- Course 5. Databases and SQL for Data Science with Python
- Course 6. Hands-on Introductino to Linux Commands and Shell Scripting
- Course 7. Relational Database Administration (DBA)
- Course 8. ETL and Data Pipelines with Shell, Airflow and Kafka
- Course 9. Getting Started with Data Warehousing and BI Analytics
- Course 10. Introduction to NoSQL Databases
- Course 11. Introduction to Big Data with Spark and Hadoop
- Course 12. Data Engineering and Machine Learning uisng Spark
- Course 13. Data Engineering Capstone Project
1. Introduction to Data Engineering
- List basic skills required for an entry-level data engineering role.
- Discuss various stages and concepts in the data engineering lifecycle.
- Describe data engineering technologies such as Relational Databases, NoSQL Data Stores, and Big Data Engines.
- Summarize concepts in data security, governance, and compliance.
2. Python for Data Science, AI & Development
- Describe Python Basics including Data Types, Expressions, Variables, and Data Structures.
- Apply Python programming logic using Branching, Loops, Functions, Objects & Classes.
- Demonstrate proficiency in using Python libraries such as Pandas, Numpy, and Beautiful Soup.
- Access web data using APIs and web scraping from Python in Jupyter Notebooks.
3. Python Project for Data Engineering
- Demonstrate your skills in Python for working with and manipulating data.
- Implement webscraping and use APIs to extract data with Python.
- Play the role of a Data Engineer working on a real project to extract, transform, and load data.
- Use Jupyter notebooks and IDEs to complete your project.
4. Introduction to Relational Database (RDBMs)
- Describe data, databases, relational databases, and cloud databases.
- Describe information and data models, relational databases, and relational model concepts (including schemas and tables).
- Explain an Entity Relationship Diagram and design a relational database for a specific use case.
- Develop a working knowledge of popular DBMSes including MySQL, PostgreSQL, and IBM DB2.
5. Databases and SQL for Data Science with Python
- Analyze data within a database using SQL and Python.
- Create a relational database and work with multiple tables using DDL commands.
- Construct basic to intermediate level SQL queries using DML commands.
- Compose more powerful queries with advanced SQL techniques like views, transactions, stored procedures, and joins.
6. Hands-on Introductino to Linux Commands and Shell Scripting
- Describe the Linux architecture and common Linux distributions and update and install software on a Linux system.
- Perform common informational, file, content, navigational, compression, and networking commands in Bash shell.
- Develop shell scripts using Linux commands, environment variables, pipes, and filters.
- Schedule cron jobs in Linux with crontab and explain the cron syntax.
7. Relational Database Administration (DBA)
- Create, query, and configure databases and access and build system objects such as tables.
- Perform basic database management including backing up and restoring databases as well as managing user roles and permissions.
- Monitor and optimize important aspects of database performance.
- Troubleshoot database issues such as connectivity, login, and configuration and automate functions such as reports, notifications, and alerts.
8. ETL and Data Pipelines with Shell, Airflow and Kafka
- Describe and contrast Extract, Transform, Load (ETL) processes and Extract, Load, Transform (ELT) processes.
- Explain batch vs concurrent modes of execution.
- Implement an ETL pipeline through shell scripting.
- Describe data pipeline components, processes, tools, and technologies.
9. Getting Started with Data Warehousing and BI Analytics
- Explore the architecture, features, and benefits of data warehouses, data marts, and data lakes and identify popular data warehouse system vendors.
- Design and populate a data warehouse, and model and query data using CUBE, ROLLUP, and materialized views.
- Identify popular data analytics and business intelligence tools and vendors and create data visualizations using IBM Cognos Analytics.
- Design and load data into a data warehouse, write aggregation queries, create materialized query tables, and create an analytics dashboard.
10. Introduction to NoSQL Databases
- Differentiate between the four main categories of NoSQL repositories.
- Describe the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools.
- Perform common tasks using MongoDB tasks including create, read, update, and delete (CRUD) operations.
- Execute keyspace, table, and CRUD operations in Cassandra.
11. Introduction to Big Data with Spark and Hadoop
- Explain the impact of big data, including use cases, tools, and processing methods.
- Describe Apache Hadoop architecture, ecosystem, practices, and user-related applications, including Hive, HDFS, HBase, Spark, and MapReduce.
- Apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL.
- Use Spark’s RDDs and data sets, optimize Spark SQL using Catalyst and Tungsten, and use Spark’s development and runtime environment options.
12. DMachine Learning with Apache Spark
- Describe ML, explain its role in data engineering, summarize generative AI, discuss Spark's uses, and analyze ML pipelines and model persistence.
- Evaluate ML models, distinguish between regression, classification, and clustering models, and compare data engineering pipelines with ML pipelines.
- Construct the data analysis processes using Spark SQL, and perform regression, classification, and clustering using SparkML.
- Demonstrate connecting to Spark clusters, build ML pipelines, perform feature extraction and transformation, and model persistence.
13. Data Engineering Capstone Project
- Demonstrate proficiency in skills required for an entry-level data engineering role.
- Design and implement various concepts and components in the data engineering lifecycle such as data repositories.
- Showcase working knowledge with relational databases, NoSQL data stores, big data engines, data warehouses, and data pipelines.
- Apply skills in Linux shell scripting, SQL, and Python programming languages to Data Engineering problems.