Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move starter project into dbt repo #3474

Merged
merged 10 commits into from
Jun 22, 2021
Merged

Conversation

leahwicz
Copy link
Contributor

Addresses issue #3005

Description

I moved the starter_project from its own repo into this repo. I copied the starter project from the dbt-yml-config-version-2 tag.

This will stop cloning the project from the other repo and instead will package the starter project within the dbt release and then copy it to the given directory.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Jun 18, 2021
@leahwicz leahwicz requested review from a team, gshank, kwigley and jtcohen6 and removed request for a team June 18, 2021 12:49
@leahwicz
Copy link
Contributor Author

@jtcohen6 - do we need a changelog for changes that aren't customer facing?

@kwigley - can we test packaging this up for distribution to see if it works? I just really want to check that the files are included.

'py.typed',
]
},
include_package_data = True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple articles I read said to use include_package_data instead of package_data with the MANIFEST.in file so that's why I made this change. You don't have to list the files then.

Example:
https://newbedev.com/how-include-static-files-to-setuptools-python-package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like it! We did the same in dbt-spark a few months ago: dbt-labs/dbt-spark#151. To quote a wise person, "This is much easier."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@leahwicz leahwicz temporarily deployed to Postgres June 18, 2021 12:54 Inactive
@leahwicz leahwicz temporarily deployed to Bigquery June 18, 2021 12:54 Inactive
@leahwicz leahwicz temporarily deployed to Redshift June 18, 2021 12:54 Inactive
@leahwicz leahwicz temporarily deployed to Redshift June 18, 2021 12:54 Inactive
@leahwicz leahwicz temporarily deployed to Snowflake June 18, 2021 12:54 Inactive
@leahwicz leahwicz temporarily deployed to Snowflake June 18, 2021 12:54 Inactive
Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!! I just took this for a spin, the new approach seems great.

  • Did we lose the model files (models/example/*.sql) themselves?
  • Have you checked in with our Experience colleagues about whether this approach is something they could leverage in the future (rather than reimplementing)?
  • I left some stylistic comments, things I've long been annoyed by but never mustered the energy to change. Now's our chance!
  • I also left a comment around some related-but-separate improvements to init. It shouldn't be considered blocking to this PR, but if we could find a neat way to do it, it will have been a long time coming.
  • While taking this for a spin, I played around with a fix to dbt init --adapter default to the available adapter #2814. I'll open a separate PR for that based off this branch.

core/dbt/include/starter_project/dbt_project.yml Outdated Show resolved Hide resolved
core/dbt/include/starter_project/dbt_project.yml Outdated Show resolved Hide resolved
Comment on lines +5 to +10
name: 'my_new_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'default'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love if we could figure out a way to automatically populate name and profile with the project_name argument supplied to dbt init. Right now, that project_name arg is just used to name the file directory, but not the actual project name. Pretty confusing! Also, we tend to discourage using a profile named default, and yet here we are...

I know all we're really doing here is shutil.copytree. Is there any sane way to try editing the files after copying?

For reference, I was just triaging a related issue, so I mentioned this over there as well: #3462 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of expertise with jinja templates :). Perhaps we could make a dbt_project jinja template and substitute the project name. Or just do a python substitution if it's simpler. I guess jinja would only make sense if there are other ways we could leverage it to customize the project file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, you're not wrong. Back in the day, we did a lot of customization for client projects over here: https://github.com/fishtown-analytics/dbt-init

The purpose there was cross-applying some pretty opinionated configs for a given adapter. I think, for our purposes, we shouldn't go much further beyond name and profile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a crack at doing this in another PR to follow this one up. So we want project_name to be the name and profile field in the dbt_project.yml file (just to verify)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right!

'py.typed',
]
},
include_package_data = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like it! We did the same in dbt-spark a few months ago: dbt-labs/dbt-spark#151. To quote a wise person, "This is much easier."

@leahwicz
Copy link
Contributor Author

* Did we lose the model files (`models/example/*.sql`) themselves?

🤦‍♀️ I had my global .gitignore file set to filter out sql and I missed this so good catch

leahwicz and others added 2 commits June 21, 2021 19:16
Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>
Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>
@leahwicz leahwicz temporarily deployed to Postgres June 21, 2021 23:22 Inactive
@leahwicz leahwicz temporarily deployed to Bigquery June 21, 2021 23:22 Inactive
@leahwicz leahwicz temporarily deployed to Bigquery June 21, 2021 23:22 Inactive
@leahwicz leahwicz temporarily deployed to Snowflake June 21, 2021 23:22 Inactive
@leahwicz leahwicz temporarily deployed to Snowflake June 21, 2021 23:22 Inactive
@leahwicz leahwicz temporarily deployed to Redshift June 21, 2021 23:22 Inactive
@leahwicz leahwicz temporarily deployed to Redshift June 21, 2021 23:22 Inactive
@leahwicz leahwicz temporarily deployed to Postgres June 22, 2021 13:21 Inactive
@leahwicz leahwicz temporarily deployed to Redshift June 22, 2021 13:21 Inactive
@leahwicz leahwicz temporarily deployed to Redshift June 22, 2021 13:21 Inactive
@leahwicz leahwicz temporarily deployed to Bigquery June 22, 2021 13:21 Inactive
@leahwicz leahwicz temporarily deployed to Bigquery June 22, 2021 13:21 Inactive
@leahwicz leahwicz temporarily deployed to Snowflake June 22, 2021 13:21 Inactive
@leahwicz leahwicz temporarily deployed to Snowflake June 22, 2021 13:21 Inactive
Copy link
Contributor

@kwigley kwigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good!

I confirmed that the source and wheel distributions include the starter project files. In the dbt/core directory, I ran python setup.py sdist bdist_wheel which creates a source (*.tar.gz) and a wheel (*.whl) distribution. I installed these in separate temporary virtualenvs and confirmed the project existed and init command worked for both.

I think if we want a comprehensive project initialization, we could consider using a scaffolding tool like cookiecutter. This might be overkill for the current starter project thought.

@jtcohen6
Copy link
Contributor

I think if we want a comprehensive project initialization, we could consider using a scaffolding tool like cookiecutter. This might be overkill for the current starter project thought.

We've had this idea before: dbt-labs/dbt-init#31. I'm open to it! Agree that it's probably not necessary for now, but might be a good idea if we want init to do something more complex than just copy files (#3462). I also like how interactive cookiecutter is with setup Q&As, it inspired some of my thinking about an interactive install experience over in #3361.

@leahwicz leahwicz merged commit c794600 into develop Jun 22, 2021
@leahwicz leahwicz deleted the leahwicz/starter-repo-move branch June 22, 2021 15:03
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
Addresses issue #3005

automatic commit by git-black, original commits:
  c794600
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants