-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Engineering #9
Comments
@nitinjethwani7 @sahithi02 @anuraagbhavaraju let me know when the data is uploaded to the repo as discussed |
Also if anyone is interested in the data engineering part I can give a walkthrough |
Yes i am interested ,please guide! |
I am interested too! |
@sahithi02 the above links should give you a fair idea how to get started @nitinjethwani7 @anuraagbhavaraju @sahithi02 the aim is to take the raw source of the data and using code get to the processed format |
Dagster code example (this is example using old version)import pandas as pd
from dagster import pipeline, solid, execute_pipeline
import requests
"""useful link
- https://understandingdata.com/list-of-python-assert-statements-for-unit-tests/
"""
@solid
def read_weather_data():
df_sample = pd.read_csv("/Users/sayantikabanik/Downloads/SA4/ass-p1/sample_ds_sa.csv")
return df_sample
@solid
def state_count(sample):
age_greater_than_50 = sample.loc[sample["Age"] > 50, "Age"].count()
return age_greater_than_50
@solid
def average_cal(sample):
"""
Info: The method calculates the average experience from the sample
:param sample: DataFrame
:return: floating point positive integer
"""
avg_exp = sample.Experience.mean()
return avg_exp
@solid
def display_results(context, age_greater_than_50, avg_exp):
context.log.info(f"Count for age >50: {age_greater_than_50}")
context.log.info(f"Overall avg experience: {avg_exp}")
@solid
def test_cases(avg_exp, count_age):
assert avg_exp > 5
assert count_age > 0
assert avg_exp, count_age is not object
@pipeline
def data_pipeline():
sample = read_weather_data()
count_age = state_count(sample)
avg_exp = average_cal(sample)
display_results(count_age, avg_exp)
test_cases(avg_exp, count_age)
if __name__ == "__main__":
result = execute_pipeline(simple_pipeline) |
latest dataset is now added to the data folder |
https://github.com/sayantikabanik/FP2/tree/main/forecasting_framework/data |
intake
ingest or manually upload to cloudThe text was updated successfully, but these errors were encountered: