Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #563: Managing Data Storage On An External Hard Drive #565

Closed
wants to merge 11 commits into from
Closed

Fix #563: Managing Data Storage On An External Hard Drive #565

wants to merge 11 commits into from

Conversation

dashohoxha
Copy link
Contributor

Fix #563

```dvc
$ sudo su
# cd /mnt/data/
# git init
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's confusing, even if users use a single partition, they don't use /mnt/data as a project dir. It can be done only if you want to organize a "data registry" out of it. And then use dvc import to connect data to your project.

cleaning them up. You could do it like this:

```dvc
$ dvc add /mnt/data/raw
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would def split add and run ... And would explain that for simple case just add is enough. Pipelines should go afterwards a bit.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dashohoxha good stuff. I've left some comments and also please check the discussion in the #563

@jorgeorpinel

This comment has been minimized.

@dashohoxha

This comment has been minimized.

@dashohoxha

This comment has been minimized.

@dashohoxha dashohoxha changed the title Fix #563: Use Case: Huge data on an external local drive Fix #563: Managing Data Storage On An External Hard Drive Aug 17, 2019
@dashohoxha
Copy link
Contributor Author

I replaced /mnt/data with /mnt/external-drive to see how it looks like. But it can be changed or reverted if needed.

@shcheklein
Copy link
Member

@dashohoxha

It is easier to review a rendered document page, than to review a markdown code

There are two easy ways to mitigate this. If you use branch (let me know if you need permissions for this) instead of a fork, Heroku will be deploying a PR for you automatically like this:

Screen Shot 2019-08-22 at 4 41 21 PM

(I did this manually this time)

Second way - run the Node server locally. It's quite straightforward and is described in the documentation contributing guide.

The review process does not block working with the next page

Not necessarily. It's fine to start working on something next when one PR is under review.

I replaced /mnt/data with /mnt/external-drive to see how it looks like. But it can be changed or reverted if needed.

I don't see how does this change it. I haven't seen projects named external-drive. I haven't seen git init is being run in the root of the drive. Again, may be I'm missing the whole point here still.

I haven't seen any projects that have a writing style guide for the docs

there are lot of projects. One of the nice examples: https://docs.mattermost.com/process/documentation-guidelines.html

If you insist, I can use data in singular, although I am not convinced that this is right

Thank you! 🙏

Again, I don't want to discourage you from suggesting new stuff, improvements, etc. You have brought a lot of good topics to think about already.

@dashohoxha
Copy link
Contributor Author

If you use branch (let me know if you need permissions for this) instead of a fork

Yes, I think I need permissions (if no one has already given them to me).

Second way - run the Node server locally.

I do it, but this may be difficult for the reviewers (if they have to clone my repo).

Not necessarily. It's fine to start working on something next when one PR is under review.

Maybe, but if it happens that both PRs are working on the same files, this may create conflicts. It has happened to me in the past. This is why I say it is a bad practice. The conflicts can be resolved of course, but it is an annoyance.

I don't see how does this change it. I haven't seen projects named external-drive. I haven't seen git init is being run in the root of the drive. Again, may be I'm missing the whole point here still.

This is not usual, but it is a possible solution. In the next paragraph I say that this may not be preferable. What would you suggest about it? Should we not mention it at all?

One of the nice examples: https://docs.mattermost.com/process/documentation-guidelines.html

It is nice the note on top of it, because I think that the very idea of having to obey style rules may actually impede the contributions. On the other hand, I expect to be payed for this job, so I don't mind if I have to follow grammar rules, even if they are stupid. Sorry for being brutally honest.

Again, I don't want to discourage you from suggesting new stuff, improvements, etc. You have brought a lot of good topics to think about already.

Thanks for letting me know. Because sometimes I feel like not everyone appreciates my comments and maybe I am being perceived as a troublemaker.

@shcheklein
Copy link
Member

@dashohoxha

Yes, I think I need permissions (if no one has already given them to me).

Done! I've added you to the collaborators. If you use branches it's being deployed automatically and it's very easy to review it.

The conflicts can be resolved of course, but it is an annoyance.

Agreed. I would try to avoid working on the same file. Especially if they are being changed a lot.

What would you suggest about it? Should we not mention it at all?

Probably, yes. I need a little bit more time to do provide you some constructive feedback on this PR. It's not your problem, it's even for me not easy to come up with a good structure and language from the top of my head for this external stuff. Especially considering all other ways of managing "external" data. I'll try to do a full review once again and be more constructive.

No contribution will be rejected due to non-conforming style, although it might be edited.

Totally agree with this! And that's what we do with external contributions. You are part of team for this period of time and consider you as an "internal" contributor. There is no one else to "edit" after us :)

even if they are stupid. Sorry for being brutally honest.

I respect your directness, but please respect other people's opinion and be constructive as much as possible.

Because sometimes I feel like not everyone appreciates my comments and maybe I am being perceived as a troublemaker.

Sorry, that you see it that way. On contrary, I really like the topics you've brought on the discuss.dvc.org (dvc run complexity, for example). It resonates very well with what we have been discussing internally.

Again, give a bit of time to wrap my head around all this external data management stuff. We are traveling now and our schedule is hectic a little bit that's why it feels slow and review is not deep/actionable.

@dashohoxha
Copy link
Contributor Author

Again, give a bit of time to wrap my head around all this external data management stuff.

I am not at all in a hurry. I have not finished reading yet some tutorials and other things (about DVC).

@jorgeorpinel

This comment has been minimized.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Oct 19, 2019

Hi. Is this PR still in process? Relevant? Seems abandoned. Thanks

p.s. I'm asking because I could address a lot of my own comments above to get it moving forward but don't want to waste time if this is no longer valid or a priority.

@shcheklein
Copy link
Member

@jorgeorpinel it's an important PR. I just don't yet have a clear Managing External Data section structure in my head and where will this PR (or parts of it) fit into. Along with #455 for example that is also almost done.

@jorgeorpinel
Copy link
Contributor

Alright, I addresses all my own feedback. Please lmk if you need more help reviewing this one again. I'll unsubscribe for now.

@dashohoxha
Copy link
Contributor Author

Closed in favor of: #732

@dashohoxha dashohoxha closed this Oct 24, 2019
@dashohoxha dashohoxha mentioned this pull request Oct 25, 2019
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

how: use DVC when data is stored in an external drive
3 participants