Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is submission ID supposed to be unique? #5

Closed
zross opened this issue Apr 28, 2021 · 4 comments
Closed

Is submission ID supposed to be unique? #5

zross opened this issue Apr 28, 2021 · 4 comments

Comments

@zross
Copy link
Collaborator

zross commented Apr 28, 2021

@abelvaldivia I see that you're removing duplicates based on submission ID and date. Does that mean submissionid can be repeated?

> dim(hhs)
[1] 17286   243
> length(unique(hhs$submissionid))
[1] 17276
@abelvaldivia
Copy link
Contributor

abelvaldivia commented Apr 28, 2021

@zross submission id supposed to be unique. If dim() and length(unique)() dont coincide in the number of rows is because there are records repeated. In this case the most recent record with the same submission ID should be kept and the others removed. This code that I wrote within the process-raw-data.R file supposed to do that before the data is loaded into the app

hhs <- ALL_hhs %>%
        select(-c(updatedat, endformtimestamp, startformtimestamp)) %>%
           unique() %>%
              left_join (ALL_hhs[,c("submissionid", 
                                    "updatedat", 
                                    "startformtimestamp", 
                                    "endformtimestamp")] %>%
                                   group_by (submissionid) %>%
                                      summarise (updatedat = max(as.Date(updatedat))),
                         by = "submissionid")

@zross
Copy link
Collaborator Author

zross commented Apr 28, 2021

That's the code I ran, you should double check. It may not be doing what you think it's doing. Do you want to look?

@abelvaldivia
Copy link
Contributor

I will check

@abelvaldivia
Copy link
Contributor

@zross I just run it and I got the same number of submission IDs. Maybe run the entire code in that file ?
image

@zross zross closed this as completed May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants