-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arabic data #295
Comments
Processed files: @mcfrank Just checking, this language should be labelled "Arabic (Saudi)"? And also will need contributors / citations for these data (: |
JISH: Alroqi: |
@alvinwmtan A couple of problems. WS is too big to create a database row. There are 1079 items. The program creates a 15 character text field for each and this is too big a database row for MySQL which is the database we're using. I'm trying to find a solution but no progress yet (and I'm not confident). WG has a new category (negation_words). I need to add this to the categories.csv file. I need to add it with a lexical_category and a lexical_class. I have used function_words for both for the time being as this seems to be used quite a lot. Finally, some of the cells have "Understands ONLY, Understands & Says" in them. They should be one or the other. No cells have them reversed so I think this is the actual value. I can link these so that these result in produces BUT I will need to amend the file so these use a semi-colon instead of comma because the comma specifies a different field. |
|
Arabic (Saudi) WG is now available to test. I cannot load WS until we have a decision about whether we could us u instread of understands and p instead of produces. This would need to apply across all datasets and would impact the shiny app as previously mentioned |
(fixing by switching to "u" and "p", as in #298) |
I endorse this suggestion since it may come up again and will generally
save space. But we do need to update the shiny apps as noted. @mikabr may
need to update. Will we need to change all instruments or are "understands"
and "u" now both options?
…On Mon, Dec 4, 2023 at 1:27 PM Alvin Tan ***@***.***> wrote:
(fixing by switching to "u" and "p", as in #298
<#298>)
—
Reply to this email directly, view it on GitHub
<#295 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI25F3R3LYKHO6536HFKRDYHY52PAVCNFSM6AAAAAAX6L5NH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGUYDKMZXGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@alvinwmtan We still have an issue here. I am now getting an error message of "Too many columns". I've done some reading about this and I cannot increased parameters to allow more fields. I therefore propose we amend the Arabic (Saudi) WS to be 2 files and hence 2 tables. |
@mcfrank @alvinwmtan |
@HenryMehta Hm okay. Do you know what the column limit is? |
@alvinwmtan It's not actually that simple because it also depends on the column names. I could probably work out but would take some time. I think we should aim to keepthe max to 750 |
@HenryMehta Given that the size of the col names also matters, do you think it might be possible to retain the full table if we converted all the colnames to just numbers? That would reduce the size. If not I'll think about how to split the dataset up. |
@alvinwmtan We could try but I don't know how many columns that would give us and the names would actually need changing for every study because of the way the application works. We would need to change the code as well because column names are current called 'item_xx', where xx is the column number. We could reduce it name to 'ixx' because columns names must start with a letter |
@HenryMehta Here is one attempt: I've separated the words (WS) and all other item types (WSOther); WS still has >800 items but hopefully it will be okay. The WS from Alroqi is unchanged. Let me know if this split is still too large and I will find a different solution. [ArabicSaudi_WS].csv |
@alvinwmtan You've split the JISH files but not the Alroqi |
@HenryMehta I believe the Alroqi files are all still within "WS" (only the JISH had items that now fall in "WSOther") |
OK |
@alvinwmtan Deploying to dev now - will need about 40 minutes to load |
I've implemented allowing "u" and "p" values in wordbankr. but none of the Saudi Arabic tables seem to have those values, and the WSOther table seems to have zero rows (I'm connecting to |
@HenryMehta WS looks good, don't seem to see any WSOther data |
@alvinwmtan try now |
@HenryMehta WS and WSOther look good. I realised I also failed to disambiguate some of the items in the WG; these should be de-conflicted now: ArabicSaudiWG_Alroqi_data.csv |
@alvinwmtan You've re-introduced the cells with "understands only, understands & says" instead of just one. I have previously changed these to "understands & says". I have reapplied this change |
@HenryMehta thanks for catching that; looks good to me now! |
import arabic data from https://github.com/langcog/ArabicCAT
The text was updated successfully, but these errors were encountered: