-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding anaconda environment file #21
Conversation
Also once this is merged, I will add documentation here: https://github.com/sfbrigade/nltweets/wiki/2.-Getting-Started (or maybe a separate Development Environment Page?) on how to get setup using Anaconda. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great with me, but I really want @pahdo and @nhilton92 to sign off.
@VincentLa a section on getting started with Anaconda would be great! Wherever you think it fits, either the Getting Started page or create a new one. I think after Project Architecture might be a great place to put learning resources. |
Anaconda is good and the industry standard for data science. Let's use it. I do have a concern that we're relying on too many different packages that doing the same thing for us. From a quick scan, we have 2 packages that we use for text preprocessing (nltk, spacy) and 3 packages that we use for LDA (scikit-learn, guidedlda, and gensim). In a project with a small number of contributors, I don't see this as a problem. However, in a large project, 1. including redundant packages may fragment what packages we use to accomplish the same task, and 2. this fragmentation will make it very difficult to refactor and remove these redundant packages in the future. I think we should have a discussion of whether we are okay with having these redundant packages before finalizing anything about our development environment. |
However, the discussion above ^ shouldn't block moving forward with using Anaconda. But let's resolve this before committing our |
@pahdo super interesting I think your points are great in terms of long term project organization and sustainability we should definitely try to enforce centralized ways of doing this in "production". In the meantime though, we do need to be able to support individuals being able to prototype and test things quickly using whatever packages they want to use. In a "production" environment, we'll definitely want to standardize, but in the meantime I don't necessarily see this issue blocking now, especially given that we have a small team, as you said. What do you think? |
environment.yml
Outdated
- ipykernel=4.6.1 | ||
- matplotlib==3.0.2 | ||
- nltk==3.4 | ||
- numpy=1.15.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need to be numpy==1.15.4
I think I agree with @VincentLa14 here - Anaconda is definitely a good way to manage environments, especially in the development phase of this project. I also agree that whilst we are building things out and before we put anything into production it should be ok to have a variety of similar packages in our environment, we can choose one or the other when we have found something which works and start committing production jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good pending the mistake in line 9
environment.yml
Outdated
- conda-forge | ||
- defaults | ||
dependencies: | ||
- ipykernel=4.6.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ipykernel==4.6.1
environment.yml
Outdated
- nltk==3.4 | ||
- numpy==1.15.4 | ||
- pandas==0.23.4 | ||
- python=3.6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python==3.6.0
@nhilton92 I think I fixed all typos. |
1. Brief Summary of what this PR accomplishes (140 characters or less. If you find trouble describing what you are doing in this length, consider breaking the PR into multiple ones.)
Adds Anaconda Environment File
2. Link to GitHub Issue
Issue #6 ; Also potentially replaces #10
3. More detailed description and other questions to address in code review
As discussed here: https://sfbrigade.slack.com/archives/CEL6C4Q49/p1547185246007700, I'd like to recommend using Anaconda as the environment for this project. I am definitely open for other recommendations, but please see the Slack thread for my reasons why.
4. Remember to tag reviewers!