arabic data #295

mcfrank · 2023-05-11T16:29:08Z

import arabic data from https://github.com/langcog/ArabicCAT

alvinwmtan · 2023-07-30T14:39:01Z

Processed files:
[ArabicSaudi_WG].csv
[ArabicSaudi_WS].csv
ArabicSaudiWG_Alroqi_data.csv
ArabicSaudiWG_Alroqi_fields.csv
ArabicSaudiWG_Alroqi_values.csv
ArabicSaudiWS_Alroqi_data.csv
ArabicSaudiWS_Alroqi_fields.csv
ArabicSaudiWS_Alroqi_values.csv
ArabicSaudiWS_JISH_data.csv
ArabicSaudiWS_JISH_fields.csv
ArabicSaudiWS_JISH_values.csv
ArabicSaudi_notes.md

@mcfrank Just checking, this language should be labelled "Arabic (Saudi)"? And also will need contributors / citations for these data (:

mcfrank · 2023-07-31T17:50:46Z

Thanks! This is Arabic (Saudi), and the citation for the JISH data is the manual listed on the CDI website. For the other dataset, I just forwarded all the info I have. Mike

…

alvinwmtan · 2023-07-31T23:43:55Z

JISH:
Contributor: Jeddah Institute for Speech and Hearing
Citation: Dashash, N., & Safi, S. (2014). JISH Arabic Communicative Development Inventory: Saudi population JACDI: User’s guide and technical manual. Jeddah: Jeddah Institute for Speech and Hearing

Alroqi:
Contributors:
Haifa Alroqi, King Abdulaziz University
Alaa Almohammadi, King Abdulaziz University
Khadeejah Alaslani, Purdue University
Citation: TBD

HenryMehta · 2023-11-27T17:30:53Z

@alvinwmtan
I've started on Arabic (Saudi).

A couple of problems. WS is too big to create a database row. There are 1079 items. The program creates a 15 character text field for each and this is too big a database row for MySQL which is the database we're using. I'm trying to find a solution but no progress yet (and I'm not confident).

WG has a new category (negation_words). I need to add this to the categories.csv file. I need to add it with a lexical_category and a lexical_class. I have used function_words for both for the time being as this seems to be used quite a lot.

Finally, some of the cells have "Understands ONLY, Understands & Says" in them. They should be one or the other. No cells have them reversed so I think this is the actual value. I can link these so that these result in produces BUT I will need to amend the file so these use a semi-colon instead of comma because the comma specifies a different field.

alvinwmtan · 2023-11-29T00:50:45Z

@HenryMehta

WS too big: hmm, I'm not really sure what an alternative solution would be. It would be sad to have to drop some rows—it just happens that the form for this language is particularly large...
WG negation_words: function_words is good for them.
"Understands ONLY, Understands & Says": let's map these to "produces" as you mentioned. It should be okay to amend the original file.

HenryMehta · 2023-12-04T13:12:09Z

@alvinwmtan

Arabic (Saudi) WG is now available to test.

I cannot load WS until we have a decision about whether we could us u instread of understands and p instead of produces. This would need to apply across all datasets and would impact the shiny app as previously mentioned

alvinwmtan · 2023-12-04T21:26:53Z

(fixing by switching to "u" and "p", as in #298)

mcfrank · 2023-12-04T21:31:01Z

I endorse this suggestion since it may come up again and will generally save space. But we do need to update the shiny apps as noted. @mikabr may need to update. Will we need to change all instruments or are "understands" and "u" now both options?

…

On Mon, Dec 4, 2023 at 1:27 PM Alvin Tan ***@***.***> wrote: (fixing by switching to "u" and "p", as in #298 <#298>) — Reply to this email directly, view it on GitHub <#295 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI25F3R3LYKHO6536HFKRDYHY52PAVCNFSM6AAAAAAX6L5NH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGUYDKMZXGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

HenryMehta · 2023-12-05T10:55:59Z

@alvinwmtan We still have an issue here. I am now getting an error message of "Too many columns". I've done some reading about this and I cannot increased parameters to allow more fields. I therefore propose we amend the Arabic (Saudi) WS to be 2 files and hence 2 tables.

HenryMehta · 2023-12-05T10:56:54Z

I endorse this suggestion since it may come up again and will generally save space. But we do need to update the shiny apps as noted. @mikabr may need to update. Will we need to change all instruments or are "understands" and "u" now both options?
…
On Mon, Dec 4, 2023 at 1:27 PM Alvin Tan @.> wrote: (fixing by switching to "u" and "p", as in #298 <#298>) — Reply to this email directly, view it on GitHub <#295 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI25F3R3LYKHO6536HFKRDYHY52PAVCNFSM6AAAAAAX6L5NH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGUYDKMZXGA . You are receiving this because you were mentioned.Message ID: @.>

@mcfrank @alvinwmtan
For now I've applied it to French (French) WS plus all future instruments added

alvinwmtan · 2023-12-05T16:35:54Z

@HenryMehta Hm okay. Do you know what the column limit is?

HenryMehta · 2023-12-05T17:39:31Z

@alvinwmtan It's not actually that simple because it also depends on the column names. I could probably work out but would take some time. I think we should aim to keepthe max to 750

alvinwmtan · 2023-12-05T17:41:35Z

@HenryMehta Given that the size of the col names also matters, do you think it might be possible to retain the full table if we converted all the colnames to just numbers? That would reduce the size. If not I'll think about how to split the dataset up.

HenryMehta · 2023-12-05T17:51:34Z

@alvinwmtan We could try but I don't know how many columns that would give us and the names would actually need changing for every study because of the way the application works. We would need to change the code as well because column names are current called 'item_xx', where xx is the column number. We could reduce it name to 'ixx' because columns names must start with a letter

alvinwmtan · 2023-12-14T05:57:54Z

@HenryMehta Here is one attempt: I've separated the words (WS) and all other item types (WSOther); WS still has >800 items but hopefully it will be okay. The WS from Alroqi is unchanged. Let me know if this split is still too large and I will find a different solution.

[ArabicSaudi_WS].csv
[ArabicSaudi_WSOther].csv
ArabicSaudiWS_JISH_data.csv
ArabicSaudiWS_JISH_fields.csv
ArabicSaudiWS_JISH_values.csv
ArabicSaudiWSOther_JISH_data.csv
ArabicSaudiWSOther_JISH_fields.csv
ArabicSaudiWSOther_JISH_values.csv

HenryMehta · 2023-12-14T16:45:53Z

@alvinwmtan You've split the JISH files but not the Alroqi

alvinwmtan · 2023-12-14T16:48:09Z

@HenryMehta I believe the Alroqi files are all still within "WS" (only the JISH had items that now fall in "WSOther")

HenryMehta · 2023-12-14T16:48:59Z

OK

HenryMehta · 2023-12-14T17:32:18Z

@alvinwmtan Deploying to dev now - will need about 40 minutes to load

mikabr · 2023-12-15T17:12:30Z

I've implemented allowing "u" and "p" values in wordbankr. but none of the Saudi Arabic tables seem to have those values, and the WSOther table seems to have zero rows (I'm connecting to wordbank2-dev-3).

alvinwmtan · 2023-12-18T19:19:50Z

@HenryMehta WS looks good, don't seem to see any WSOther data

HenryMehta · 2023-12-19T13:00:07Z

@alvinwmtan try now

alvinwmtan · 2023-12-19T20:32:12Z

@HenryMehta WS and WSOther look good. I realised I also failed to disambiguate some of the items in the WG; these should be de-conflicted now:

ArabicSaudiWG_Alroqi_data.csv
ArabicSaudiWG_Alroqi_fields.csv

HenryMehta · 2023-12-20T10:33:45Z

@alvinwmtan You've re-introduced the cells with "understands only, understands & says" instead of just one. I have previously changed these to "understands & says". I have reapplied this change

alvinwmtan · 2023-12-20T18:00:51Z

@HenryMehta thanks for catching that; looks good to me now!

mcfrank added the data label May 11, 2023

HenryMehta self-assigned this Dec 4, 2023

HenryMehta added a commit that referenced this issue Dec 14, 2023

#295 Arabic (Saudi)

fb54774

HenryMehta added a commit that referenced this issue Dec 19, 2023

#295 Update values for Arabic WSOther

1552d4a

HenryMehta added a commit that referenced this issue Dec 20, 2023

#295 Amend Alroqi WG

98f7ba6

HenryMehta closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arabic data #295

arabic data #295

mcfrank commented May 11, 2023

alvinwmtan commented Jul 30, 2023 •

edited

Loading

mcfrank commented Jul 31, 2023 via email

alvinwmtan commented Jul 31, 2023

HenryMehta commented Nov 27, 2023

alvinwmtan commented Nov 29, 2023

HenryMehta commented Dec 4, 2023

alvinwmtan commented Dec 4, 2023

mcfrank commented Dec 4, 2023 via email

HenryMehta commented Dec 5, 2023

HenryMehta commented Dec 5, 2023

alvinwmtan commented Dec 5, 2023

HenryMehta commented Dec 5, 2023

alvinwmtan commented Dec 5, 2023

HenryMehta commented Dec 5, 2023

alvinwmtan commented Dec 14, 2023

HenryMehta commented Dec 14, 2023

alvinwmtan commented Dec 14, 2023

HenryMehta commented Dec 14, 2023

HenryMehta commented Dec 14, 2023

mikabr commented Dec 15, 2023

alvinwmtan commented Dec 18, 2023

HenryMehta commented Dec 19, 2023

alvinwmtan commented Dec 19, 2023

HenryMehta commented Dec 20, 2023

alvinwmtan commented Dec 20, 2023

arabic data #295

arabic data #295

Comments

mcfrank commented May 11, 2023

alvinwmtan commented Jul 30, 2023 • edited Loading

mcfrank commented Jul 31, 2023 via email

alvinwmtan commented Jul 31, 2023

HenryMehta commented Nov 27, 2023

alvinwmtan commented Nov 29, 2023

HenryMehta commented Dec 4, 2023

alvinwmtan commented Dec 4, 2023

mcfrank commented Dec 4, 2023 via email

HenryMehta commented Dec 5, 2023

HenryMehta commented Dec 5, 2023

alvinwmtan commented Dec 5, 2023

HenryMehta commented Dec 5, 2023

alvinwmtan commented Dec 5, 2023

HenryMehta commented Dec 5, 2023

alvinwmtan commented Dec 14, 2023

HenryMehta commented Dec 14, 2023

alvinwmtan commented Dec 14, 2023

HenryMehta commented Dec 14, 2023

HenryMehta commented Dec 14, 2023

mikabr commented Dec 15, 2023

alvinwmtan commented Dec 18, 2023

HenryMehta commented Dec 19, 2023

alvinwmtan commented Dec 19, 2023

HenryMehta commented Dec 20, 2023

alvinwmtan commented Dec 20, 2023

alvinwmtan commented Jul 30, 2023 •

edited

Loading