## Creating Alignment data using Tag Augmentation

#### Setting up config.ini

The code is config driven, thus all the information necessary is obtained from the config.ini. This includes file path, file name and various other parameters that are specific to the tests.

With respect to the alignment testset creation, we need to have a Golden Dataset which refers to excel file that contains questions with their expected answers. The questions are expected to be formed with TAGS, which will be replaced with their values as mentioned in the config file. Sample golden dataset and config.ini can be found in data section in the examples folder.

Lets define the config.ini file with respect to the alignment function. (The complete config.ini file and their sample values can be seen [here](https://github.com/michelin/LLMInspector/wiki/Getting-Started)):

```ini
[Alignment_File]
Alignment_input_FilePath = User/input_file_directory/
Alignment_GoldenDataset_FileName= input_file_name.xlsx

Alignment_output_path = User/output_file_directory/
Alignment_Output_fileName =  /output_file_name_
paraphrase_count = 2
augmentations = {
    'uppercase': ('Robustness', 1),
    'lowercase': ('Robustness', 1),
    'typo': ('Robustness', 1),
    'add_punctuation': ('Robustness', 1),
    'strip_punctuation': ('Robustness', 1),
    'context': ('Robustness', 1),
    'titlecase': ('Robustness', 1),
    'contract': ('Robustness', 1),
    'abbreviate': ('Robustness', 1),}

[Tag_Augmentation_Key]
tag_keyword_dict = {
    '{greeting}' : ['Hi','Hey', 'Hola', 'Namaste'],
    '{seasonality}' : ['summer', 'winter', 'rain', 'all season'],
    '{road_condition}' : ['dry', 'off-road', 'wet', 'snow'],
    '{brand_name}': ['Apple', 'Tesla','Tata'],
    '{product_name}': ['Macbook Pro', 'Apple Watch', 'Macbook Mini', 'Model X'],
    '{company_&_location}': ["Apple, California"],
    '{department}': ['Engineering', 'Design', 'IT', 'Manufacturing'],
    '{country_name}': ['India', 'United States', 'China'],
    '{city_name}': ['New York', 'Chicago', 'Houston', 'Los Angeles'],
    '{nationality}': ['Indian', 'American', 'French'],
    '{language}': ['English', 'French', 'Hindi'],
    '{activity}': ['Photography', 'Trekking', 'Singing'],
    '{mood_positive}': ['happy', 'excited', 'amazing'],
    '{mood_negative}': ['sad', 'updset', 'angry'],
    }

[Augmentation_Type]
augmentation_dict = {
    'Greetings' : ['greeting'],
    'Seasons' : ['seasonality'],
    'Rating': ['Season', 'Rating'],
    'Road Conditions':['road_condition'],
    'Brands' : ['brand_name'],
    'Products': ['product_name'],
    'Countries':['country_name'],
    'Cities': ['city_name'],
    'Nationalities': ['nationality'],
    'Languages': ['language'],
    'Activities': ['activity'],
    'Moods': ['mood_positive', 'mood_negative'],
    'Locations': ['company_&_location'],
    'Job Departments': ['department']
    }
```

The above config contains the necessary variables and their sample values. The user is expected to give the paths and the file name with respect to their project. 

The tag augmentation keys are the keys that are present in the goldenDataset which are expected to be substituted by their values whih are given in the form of a list. The tags are expected to be writted within '{}'.

The sentences that have the tag replaced with their values are then augmented based on the config that is defined in the config as augmentations. Where the key of the dictionary is the operation performed on the sentence whereas the values is the tuple is the capability that it can be tested upon and the percentage of tag replaced sentence are augmented. For example, {'uppercase': ('Robustness', 0.5)}, applies uppercase to all the selected sentences, where 50 percent of the tag augmented sentences are taken in random for each augmentation type.

The augmentation_dict has the types as the key and the values as a list of tags that are present in the tag_keyword_dict.

The paraphrase count is the number of paraphrased sentence obtained from each augmented sentence.

#### Code Execution

In [1]:
# Importing necessary libraries
from llm_inspector.llminspector import llminspector

In [12]:
# Initialising the class object
obj = llminspector(config_path="config.ini", env_path=".env")

In [None]:
obj.alignment()