Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nawin #18

Merged
merged 2 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ _Navaneeth Malingan_
- [PyImageSearch](https://www.pyimagesearch.com/start-here/)
- [5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python](https://www.mrdbourke.com/5-beginner-friendly-steps-to-learn-machine-learning/)


## Intro to ML

- [Luis Serrano: A Friendly Introduction to Machine Learning](https://www.youtube.com/watch?v=IpGxLWOIZy4)
Expand Down
36 changes: 36 additions & 0 deletions data_engineering/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Data Engineering Resources


Data engineering is a field of work that involves **designing, building, and managing the infrastructure** and systems required to **collect, store, process, and analyze data**. Data engineers play a crucial role in the data lifecycle, ensuring that data is available, accessible, and reliable for various data-driven applications and decision-making processes.

---
## Here are some key resources for Data Engineering
---
### Batch Proceesing


**Batch processing** is a data processing technique where a set of data is collected over a period of time and processed as a group or batch. In batch processing, data is processed in predefined batches rather than being processed in real-time or immediately upon arrival. to understand the basics of Data Engineering, see this resources.

- [Understanding Data Engineering by Datacamp](https://app.datacamp.com/learn/courses/understanding-data-engineering)
- [Introduction to Data Engineering by Datacamp](https://app.datacamp.com/learn/courses/introduction-to-data-engineering)
- [Apache Spark Tutorial (used for Large Scale Data Processing using SQL commands)](https://spark.apache.org/docs/latest/sql-getting-started.html)
- [Test your knowledge using ProjectPro](https://www.projectpro.io/article/big-data-interview-questions-/773)


### Stream Processing
**Stream processing** is a method of data processing that involves continuously processing and analyzing data as it is generated or received in real-time. It enables the handling and analysis of data in motion, allowing for immediate insights and actions based on the streaming data. Here are some resources to refer to,
- [Introduction to Apache Kafka Streams](https://kafka.apache.org/documentation/streams/)
- [Apache Flink Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/try-flink/datastream/)
- [Stream Processing Quiz](https://chauff.github.io/documents/bdp-quiz/streaming.html)


### Data Pipelines and Integration

**Data pipelines and integration** are critical components of data engineering that involve the movement, transformation, and integration of data from various sources to a destination for further processing, analysis, or storage. They ensure that data flows seamlessly and reliably across different systems, enabling efficient data management and utilization. Refer these resources for reference.
- [Building Data Engineering Pipelines in Python](https://app.datacamp.com/learn/courses/building-data-engineering-pipelines-in-python)
- ["What is Data Integration?" by talend](https://www.talend.com/resources/what-is-data-integration/)
- [Data Cleaning Challenge: Handling missing values](https://www.kaggle.com/code/rtatman/data-cleaning-challenge-handling-missing-values/notebook)

---

Data engineering requires knowledge of programming languages (such as Python, Java, or Scala), database systems, big data technologies, cloud platforms, data modeling, and data warehousing concepts. Data engineers also need to keep up with the evolving landscape of data technologies and best practices to ensure efficient and effective data management.