In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("efcamdat_final.csv")

In [3]:
rubric = pd.read_csv("../cefr_desc.csv")

In [4]:
rubric

Unnamed: 0,No,CEFR Descriptor Scheme (updated),Mode of\ncommunication,"Activity, strategy or competence",Scale,Level,Descriptor
0,1.0,Communicative language activities,Reception,Oral comprehension,Overall oral comprehension,C2,Can understand with ease virtually any kind of...
1,2.0,Communicative language activities,Reception,Oral comprehension,Overall oral comprehension,C1,Can understand enough to follow extended disco...
2,3.0,Communicative language activities,Reception,Oral comprehension,Overall oral comprehension,C1,Can recognise a wide range of idiomatic expres...
3,4.0,Communicative language activities,Reception,Oral comprehension,Overall oral comprehension,C1,Can follow extended discourse even when it is ...
4,5.0,Communicative language activities,Reception,Oral comprehension,Overall oral comprehension,B2+,Can understand standard language or a familiar...
...,...,...,...,...,...,...,...
1830,1831.0,Signing competences,,Pragmatic competence,Signing fluency,A2,Can indicate the end of a sentence clearly by ...
1831,1832.0,Signing competences,,Pragmatic competence,Signing fluency,A1,No descriptors available
1832,,,,,,,
1833,,,,,,,


In [5]:
rubric.columns
rubric['Activity, strategy or competence'].unique()
wrubric = rubric.loc[(rubric['Activity, strategy or competence'] == 'Written production')
          | (rubric['Activity, strategy or competence'] == 'Written interaction')]

In [6]:
wrubric = wrubric[wrubric['Level'] != 'Pre-A1']

In [7]:
wrubric.loc[wrubric['Level'] == 'A2+'] = 'B1'
wrubric.loc[wrubric['Level'] == 'B1+'] = 'B2'
wrubric.loc[wrubric['Level'] == 'B2+'] = 'C1'

In [8]:
wrubric = wrubric.groupby('Level')['Descriptor'].apply(lambda x: '; '.join(x)).reset_index()

In [9]:
rub = {}
for idx, row in wrubric.iterrows():
    rub[row['Level']] = row['Descriptor']

In [10]:
rub

{'A1': 'Can give information about matters of personal relevance (e.g. likes and dislikes, family, pets) using simple words/signs and basic expressions.; Can produce simple isolated phrases and sentences.; Can produce simple phrases and sentences about themselves and imaginary people, where they live and what they do.; Can describe in very simple language what a room looks like.; Can use simple words/signs and phrases to describe certain everyday objects (e.g. the colour of a car, whether it is big or small).; No descriptors available; Can ask for or pass on personal details.; Can compose messages and online postings as a series of very short sentences about hobbies and likes/dislikes, using simple words and formulaic expressions, with reference to a dictionary.; Can compose a short, simple postcard.; Can compose a short, very simple message (e.g. a text message) to friends to give them a piece of information or to ask them a question.; Can fill in numbers and dates, own name, national

In [181]:
preamble = f"""You are an agent designed to provide feedback on writing samples according to the provided rubric. You will first 
            receive the rubric, then you will receive the writing sample and what classification it falls under. You will use
            this information to explain why the writing sample falls under that classification based on the rubric. 

            The following rubric is given in the format of Classification level: Description, Classification level: Description, ...
            Here is the rubric:

            {str(rub)}

            """

In [183]:
A_test = df[(df['cefr_numeric'] == 1) | (df['cefr_numeric'] == 2)].sample(1)
B_test = df[(df['cefr_numeric'] == 3) | (df['cefr_numeric'] == 4)].sample(1)
C_test = df[(df['cefr_numeric'] == 5)].sample(1)

In [184]:
import cohere

def explain(text, label, preamble):
    label_mappings = {1:'A1', 2:'A2', 3:'B1', 4:'B2', 5:'C1', 6:'C2'}
    label_cat = label_mappings[label]
    
    print(f"Student Text: {text}")
    print(f"Proficiency Category: {label_cat}")
    
    co = cohere.Client("2ZaCKyd56DeUgdhmBCm1og2bTY79yarrEIkBy7i6")
    response = co.chat(message=f"Please explain to me why this piece of text {text} was classified as {label_cat}. Give me specific details why it is not classified higher.", preamble=preamble, max_tokens=250)
    return response 

# Sample of A, B, C Explainability

In [185]:
explain(A_test['text'].values[0], A_test['cefr_numeric'].values[0], preamble).text

Student Text: 
	  Hi, can you buy me:  A bottle of red wine, a loaf of bread, a bag of rice, a packet of chips, one box of ice cream, some tomatoes and some pork.  Thanks.  When you go back, I'll invite you for dinner in my house!
	
Proficiency Category: A1


'This piece of text is classified as A1 because it demonstrates the ability to use simple words and phrases to convey a message about matters of personal relevance (in this case, requesting someone to buy some groceries and inviting them for dinner). Here are the specific reasons why it falls under the A1 classification and not higher:\n\n- **Limited vocabulary and sentence structure:** The text uses simple words and phrases like "a bottle of," "a loaf of," and "a bag of," indicating a basic vocabulary level. The sentences are short and mainly consist of subject-verb-object structures, which is typical of A1-level language production.\n- **Lack of complex connectors:** The text does not demonstrate the use of complex connectors like "because," "although," or "despite," which are characteristic of higher-level language production. The only connector used is "and," which is a basic connector.\n- **No abstract or complex ideas:** The content of the text is concrete and practical, focusing

In [186]:
explain(B_test['text'].values[0], B_test['cefr_numeric'].values[0], preamble).text

Student Text: 
	  Instructions for Frisbee bowling Please. We need to mark an area of 8 meters by 3 meters.  Lets use ten plastic bottles as 'bowling pins' with a little of water to make them heavier. Line up the bottles in line of 4, then 3, then 2 and then 1, like ten-pine bowling. Each player will get a frisbee and do two shots on each turn. To score the player needs to knock down each pin. The player with the most points is the winner.   
	
Proficiency Category: B1


'The text is classified as B1 because it demonstrates the ability to produce a straightforward, connected text on a familiar topic, in this case, a set of instructions for a game. The description is clear and easy to follow, with a logical structure that helps the reader understand the rules and objective of the game. \n\nHowever, there are several reasons why it is not classified at a higher level, such as B2 or C1: \n\n- Limited complexity: The text does not demonstrate the ability to synthesize or evaluate information from multiple sources, which is a characteristic of higher levels. It focuses on providing clear instructions rather than analyzing or discussing the game in a more complex manner. \n- Lack of argumentation: Higher levels, especially B2 and above, often involve developing arguments and expressing opinions. This text does not include any personal views or attempts to persuade the reader, which is typical of B1-level writing. \n- Restricted vocabulary and grammar: While 

In [187]:
explain(C_test['text'].values[0], C_test['cefr_numeric'].values[0], preamble).text

Student Text: 
	  I'd like to start by describing my so called physiolocical needs, which I've achieved generally. Secondly on the ladder appears the safety aspect. Well, I am actually quite comfortable. Everything's fine with my family and every family member feels fine. So there is nothing to criticize. I'd like to get on with the "love thing ". I am happily married for about 30 years. That's all I'd like to tell you. The intension is on happily here. There is nothing else to say. :) Esteem: I am a hard working guy and trying to give the best in job every day. I am expecting just a little  acknowledgment from time to time. Perhaps a wee bit more as usally . :) Right, last but not least I am not on the top of the ladder. There are a few things which will have to be improved. I really should spend more time with my wife. Work should not play a main role in our life anymore. I'd like to perfect my language skills within the next 3 years as well as  perhaps to start learning a second for

'The text is classified as C1 because it demonstrates the ability to produce a clear and well-structured text with complex sentences and an advanced vocabulary. The writer effectively conveys their thoughts and feelings on a personal topic, using appropriate tone and style for the audience. \n\nHowever, there are a few reasons why it is not classified higher, at C2:\n- While the text is clear and well-structured, it does not demonstrate the same level of complexity and sophistication in sentence structure and vocabulary that would be expected at the C2 level. It does not, for example, include much idiomatic or humorous expression, which is something that a C2-level text might incorporate. \n- The text also does not demonstrate the same level of flexibility in tone and style that a C2-level text might. While the writer does adapt their tone and style appropriately for the audience and topic, there is room for more variation and nuance in their expression. \n- Finally, while the text is 

In [63]:
df

Unnamed: 0,id,level,unit,learner_id,learner_nationality,grade,date,topic_id,text,cefr_numeric,cefr_grouped
0,679604,7,3,114335,br,90,2013-11-09 20:50:31.707,51,From:l AS xxx@hotmail.com To: AS xxx@IXW.corpo...,3,2
1,151196,9,2,136139,sa,94,2012-09-25 06:01:08.117,66,I am so glad to receive this email from you. A...,3,2
2,117084,9,4,34715,br,88,2011-08-28 08:01:15.677,68,"Hi Fun Skydive, so I give up of my idea. I und...",3,2
3,113857,7,6,90269,fr,90,2011-07-31 14:51:22.547,54,"Dear James, Some serious problems have been br...",3,2
4,22083,9,3,48465,br,94,2011-08-31 16:41:04.210,67,"Dear Sue, Thank you to interest in our product...",3,2
...,...,...,...,...,...,...,...,...,...,...,...
377962,1211530,10,3,142733,br,95,2014-03-03 16:17:52.003,75,"As we live in Brazil, always is a difficult ti...",4,2
377963,1211531,10,4,142733,br,95,2014-03-04 18:23:19.820,76,Our last manager meeting included a presentati...,4,2
377964,1211532,10,5,142733,br,90,2014-03-10 22:28:52.313,77,I always practice sports. I think that my firs...,4,2
377965,1211533,10,6,142733,br,80,2014-03-17 20:50:46.250,78,"I know that it take AG takes sometime, but now...",4,2


In [170]:
df[df['cefr_grouped'] == 3].sample(1)['text'].values[0]

"Brazilians are a communicative people WC lot . They like to do business face to face. In the Brazilian's PO Brazilian business culture PU , the D interpersonal relations are crucial, once WC since they are the D keys SI key MW if you want to the sucess WC succeed . When you start working with braziians SP Brazilians you will realize the trend to PR of mix WC mixing the D professional and personal relations. Your coworkers will appreciate more your kindness WO your kindness more than your professionalism. Before either a meeting or a business MW interaction, dinner is extremelly SP extremely recommendable WC recommended MW so that you learn some words in Portuguese. You will be very appreciated if you try to speak any word in Portuguese, it seems WC will seem thay SP that you are a dedicated person. Despite to schedule WC scheduling a meeting 2 to 3 weeks before, it is normal p PR for the brazilians SP Brazilians PR to cancel or change the date without prior notice. The man should wear