-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails to import variable with duplicate labels #42
Comments
Certainly it would not be too much to ask. This should not be too much work. But I have to make time because I am in the middle of the semester. Any ideas about a scheme for de-duplicating labels? What would you prefer? |
By the way, it might be preferable to import the data in two steps: |
I am fine with conflating items with different values but identical labels to one label, because I consider the label the "real state" of the data. The user is warned, and you already provide duplicate_labels() to display the situation. |
That would mean that the original codes would be changed. Maybe I should make this one option for de-duplicating. Another option would modify the labels. |
Yeah, just paste() the numeric code and the label would save the distinction in the original data and would be fairly easy for the user to programmatically conflate them later on, particularly if you use an uncommon separator character/string with paste(). paste(numeric, label, sep = "§") or something. |
Is it possible to use recode() in the second last step to fix the duplicated labels problem? |
Yep - that is point why there is the whole infrastruture of "data.set" and "item" objects in memisc. |
Oh, great! Now, this is becomes a support question then. Here is the output of duplicated_labels()
I tried to fix the first duplicated item, Hamirpur, with the following recode
and it didn't complain, but it still reports the same problem
|
I see. Well, recoding only changes the codes but not the labels. Still some work for me to do then ... |
I just found and updated some code I wrote earlier to deduplicate labels or codes. It is in the attached zip archive along with some example code and example output (dedup-labels.zip). I will include it into the next memisc release. But for now using the code in zip file may be a quick for your problem. |
There is now a new release of memisc 0.99.23 that includes a function |
Thanks a lot for fixing this! Now I just need a little help to understand where I put the call to Everything seems to work for
Codebook looks fine now
Running codebook() on unwashed data
shows the duplicated labels:
But as.data.frame() still gives an error, albeit another one now:
Thanks in advance for any guidance on how to resolve this. |
The following code works, though:
|
Is it too much to ask that
memisc
would be able to cope with duplicated labels?I work a lot with data from the Demographic and Health Surveys (DHS), and some of those files are so big that importing them with
read.spss()
requires amounts of RAM not found in most computers. Many or even most of the hundreds of files from DHS have duplicated labels in them. To getmemisc
working with such files would really help my work, currently I have a computer with 68 GB RAM so I manage, but I want others to be able to use my code.Kind regards,
Hans Ekbrand, university of Gothenburg, Sweden.
The text was updated successfully, but these errors were encountered: