Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding anaconda environment file #21

Merged
merged 5 commits into from
Jan 24, 2019

Conversation

VincentLa14
Copy link
Member

@VincentLa14 VincentLa14 commented Jan 15, 2019

1. Brief Summary of what this PR accomplishes (140 characters or less. If you find trouble describing what you are doing in this length, consider breaking the PR into multiple ones.)

Adds Anaconda Environment File

2. Link to GitHub Issue

Issue #6 ; Also potentially replaces #10

3. More detailed description and other questions to address in code review

As discussed here: https://sfbrigade.slack.com/archives/CEL6C4Q49/p1547185246007700, I'd like to recommend using Anaconda as the environment for this project. I am definitely open for other recommendations, but please see the Slack thread for my reasons why.

4. Remember to tag reviewers!

@VincentLa14
Copy link
Member Author

VincentLa14 commented Jan 15, 2019

Also once this is merged, I will add documentation here: https://github.com/sfbrigade/nltweets/wiki/2.-Getting-Started (or maybe a separate Development Environment Page?) on how to get setup using Anaconda.

@VincentLa14 VincentLa14 mentioned this pull request Jan 15, 2019
Copy link
Collaborator

@frhino frhino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great with me, but I really want @pahdo and @nhilton92 to sign off.

@frhino
Copy link
Collaborator

frhino commented Jan 16, 2019

@VincentLa a section on getting started with Anaconda would be great! Wherever you think it fits, either the Getting Started page or create a new one. I think after Project Architecture might be a great place to put learning resources.

@pahdo
Copy link
Collaborator

pahdo commented Jan 16, 2019

Anaconda is good and the industry standard for data science. Let's use it. I do have a concern that we're relying on too many different packages that doing the same thing for us. From a quick scan, we have 2 packages that we use for text preprocessing (nltk, spacy) and 3 packages that we use for LDA (scikit-learn, guidedlda, and gensim). In a project with a small number of contributors, I don't see this as a problem. However, in a large project, 1. including redundant packages may fragment what packages we use to accomplish the same task, and 2. this fragmentation will make it very difficult to refactor and remove these redundant packages in the future. I think we should have a discussion of whether we are okay with having these redundant packages before finalizing anything about our development environment.

@pahdo
Copy link
Collaborator

pahdo commented Jan 16, 2019

However, the discussion above ^ shouldn't block moving forward with using Anaconda. But let's resolve this before committing our environment.yml.

@VincentLa14
Copy link
Member Author

VincentLa14 commented Jan 16, 2019

@pahdo super interesting I think your points are great in terms of long term project organization and sustainability we should definitely try to enforce centralized ways of doing this in "production".

In the meantime though, we do need to be able to support individuals being able to prototype and test things quickly using whatever packages they want to use. In a "production" environment, we'll definitely want to standardize, but in the meantime I don't necessarily see this issue blocking now, especially given that we have a small team, as you said. What do you think?

environment.yml Outdated
- ipykernel=4.6.1
- matplotlib==3.0.2
- nltk==3.4
- numpy=1.15.4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to be numpy==1.15.4

@nhilton92
Copy link
Collaborator

I think I agree with @VincentLa14 here - Anaconda is definitely a good way to manage environments, especially in the development phase of this project. I also agree that whilst we are building things out and before we put anything into production it should be ok to have a variety of similar packages in our environment, we can choose one or the other when we have found something which works and start committing production jobs.

Copy link
Collaborator

@nhilton92 nhilton92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending the mistake in line 9

environment.yml Outdated
- conda-forge
- defaults
dependencies:
- ipykernel=4.6.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipykernel==4.6.1

environment.yml Outdated
- nltk==3.4
- numpy==1.15.4
- pandas==0.23.4
- python=3.6.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python==3.6.0

@VincentLa14
Copy link
Member Author

@nhilton92 I think I fixed all typos.

@VincentLa14 VincentLa14 merged commit 63b572c into master Jan 24, 2019
@VincentLa14 VincentLa14 deleted the adding-anaconda-environment-file branch January 24, 2019 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants