# General information
The ChilectoUtility package is developed for the Chilecto project. It contains utilities and pre-processors for different Chinese corpus. 

The package is developed under Linux system using Python 3.7. The compatibility of the package has not been tested for different platforms or Python versions. 

In this file, only generic functions will be introduced. For other processor class developed, one should refer to other tutorials in this folder.  

## Requirements
The following packages are required:
1. pandas
2. tqdm
3. hanziconv
4. numpy



## Initialize
The following section is required if the required package(s) is not located in system search path or the current folder.

In [1]:
import os
import sys
package_dir = '../../' # the location of the ChilectoUtility package folder
sys.path.insert(0, os.path.abspath(package_dir))

In [2]:
# import libs
import ChilectoUtility as cu

## Random select files
Using **rand\_file\_in\_dir** function in **ChilectoUtility.gen**, one can randomly select a certain percentage of the files in a folder and write the file names into a output file. Each line of the file will contain one file name.

#### randomly select 250m for TW

In [7]:
# percentage of files to be selected (0~100)
percentage = 50
# in which folder the files will be selected
input_dir = '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn'
# output file name
out_file = '/home/projects/semmetrix/chilecto/corp/giga20181203/tw_250m.fnames'
# if the folder will be processed recursively
recursive_flag = True 
# get the files
cu.gen.rand_file_in_dir(input_dir, percentage, out_file, recursive_flag) 

2021-01-11 15:06:05,810 - cu.gen - INFO - Files randomly selected in Folder /home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn.


['/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910101_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910104_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910105_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910106_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910107_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910108_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910111_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910113_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910117_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910122_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_19910123_wpr.csv',
 '/home/projects/semm

#### randomly select 259m for ML

In [8]:
# percentage of files to be selected (0~100)
percentage = 50
# in which folder the files will be selected
input_dir = '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn'
# output file name
out_file = '/home/projects/semmetrix/chilecto/corp/giga20181203/ml_250m.fnames'
# if the folder will be processed recursively
recursive_flag = True 
# get the files
cu.gen.rand_file_in_dir(input_dir, percentage, out_file, recursive_flag) 

2021-01-11 15:07:32,892 - cu.gen - INFO - Files randomly selected in Folder /home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn.


['/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910103_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910104_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910106_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910109_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910110_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910114_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910120_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910121_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910123_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910126_wpr.csv',
 '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_19910127_wpr.csv',
 '/home/projects/semm

## Select files based on the number (date) in file name
Using **select_file_by_num** function in **ChilectoUtility.gen**, one can select files whose file name contains any int numbers in a list, and write the file names into a output file. The function can be called as:

    select_file_by_num(dir_in, num_list, out_file, idx1, idx2, recursive_flag)
    
in which:

    dir_in: a string for input folder.
    num_list: a list of int numbers to select files. 
    out_file: output file name
    idx1/idx2: the starting and ending index of the number in file names string (file path is not considered). Note: python index starts from 0. For example, to get the number *123* in file name *dir1/foo123.txt*, one should set idx1=3, idx2=5.
    recursive_flag: True=the input folder will be processed recursively.

In [7]:
# set input
dir_in = '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn'

out_file = '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn2001_2003.fname'

# generate the number list from 20001001 to 20030930
num_list = list(range(20001001, 20031001))

# the file name will in the form: CNA_CMN_19910101_wpr.csv
idx1 = 8
idx2 = 15

In [8]:
# run the function
cu.gen.select_file_by_num(dir_in, num_list, out_file, idx1, idx2, True)

['/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001001_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001002_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001003_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001004_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001005_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001006_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001007_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001008_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001009_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001010_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001011_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001012_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga2018

## Subset three corpus by year and month

In [4]:
# set input TW
dir_in = '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn'

out_file = '/home/projects/semmetrix/chilecto/corp/giga20181203/tw.fnames'

# generate the number list 
num_list = list(range(20001000, 20001232)) \
            + list(range(20010100, 20010132)) \
            + list(range(20030400, 20030932))

# the file name will in the form: CNA_CMN_19910101_wpr.csv
idx1 = 8
idx2 = 15

# run the function
cu.gen.select_file_by_num(dir_in, num_list, out_file, idx1, idx2, True)

['/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001001_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001002_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001003_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001004_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001005_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001006_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001007_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001008_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001009_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001010_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001011_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/cna_cmn/CNA_CMN_20001012_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga2018

In [5]:
# set input ML
dir_in = '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn'

out_file = '/home/projects/semmetrix/chilecto/corp/giga20181203/ml.fnames'

# run the function
cu.gen.select_file_by_num(dir_in, num_list, out_file, idx1, idx2, True)

['/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001001_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001002_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001003_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001004_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001005_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001006_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001007_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001008_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001009_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001010_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001011_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/xin_cmn/XIN_CMN_20001012_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga2018

In [6]:
# set input SG
dir_in = '/home/projects/semmetrix/chilecto/corp/giga20181203/zbn_cmn'

out_file = '/home/projects/semmetrix/chilecto/corp/giga20181203/sg.fnames'

# run the function
cu.gen.select_file_by_num(dir_in, num_list, out_file, idx1, idx2, True)

['/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001001_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001002_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001003_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001004_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001005_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001006_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001007_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001008_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001009_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001010_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001011_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga20181203/zbn_cmn/ZBN_CMN_20001012_wpr.csv',
 '/home/semmetrix/chilecto/corp/giga2018

### one *.fnames* file for all three corpus

In [4]:
# set input THREE LECTS
dir_in = '/home/projects/semmetrix/chilecto/corp/giga20181203'

out_file = '/home/projects/semmetrix/chilecto/corp/giga20181203/gigasub3lect.fnames'

# run the function
cu.gen.select_file_by_num(dir_in, num_list, out_file, idx1, idx2, True)

NameError: name 'num_list' is not defined

# Select Based on Number of Words

In [5]:
def select_size_year(counter, year, num_word):
    counter.reset()
    counter.filter_include('.csv')
    counter.select_by_num([year], 8, 11)
    total = float(counter.count_sub_file_mp())
    percent = num_word/total*100
    if percent > 100.0:
        print('Corpus does not have enough words')
        select_word = counter.sub_total_word
    else:
        if percent > 95.0 or percent < 5.0:
            print(f'Selecting {percent:.2f} % of the files.')
        counter.rand_select(percent)
        select_word = counter.count_sub_file_mp()
    
    return select_word

In [6]:
year_list = list(range(1991, 2005))
print(year_list)

[1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004]


In [8]:
num_word = 10e6 # 10 million

### ML

In [9]:
# set input
dir_in = '/home/projects/semmetrix/chilecto/corp/giga20181203/xin_cmn'

source_list = [['dir', dir_in]]

In [10]:
counter = cu.CorpCounter(source_list, 'wpr')
counter.count_all_file_mp()

2020-03-30 10:44:43,335 - cu.gen - INFO - Read file list in folder: /home/semmetrix/chilecto/corp/giga20181203/xin_cmn.
2020-03-30 10:49:10,583 - cu.gen - INFO - All files counted in 262.2 sec.
2020-03-30 10:49:11,154 - cu.gen - INFO - All sub-files counted in 0.6 sec.


329285791

In [11]:
ml_list = []

for year in year_list:
    print(f'Running for {year:d}: ')
    res = select_size_year(counter, year, num_word)
    print(f'    {res:d} words selected, number of files: {len(counter.sub_file_list):d}')
    ml_list.extend(counter.sub_file_list)
    

2020-03-30 10:49:41,632 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:41,641 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:41,724 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:41,733 - cu.gen - INFO - All sub-files counted in 0.0 sec.


Running for 1991: 
    9974004 words selected, number of files: 289
Running for 1992: 
    10044502 words selected, number of files: 160
Running for 1993: 


2020-03-30 10:49:41,799 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:41,807 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:41,868 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:41,878 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:41,945 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:41,959 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,030 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,046 - cu.gen - INFO - All sub-files counted in 0.0 sec.


    9881153 words selected, number of files: 161
Running for 1994: 
    10184506 words selected, number of files: 157
Running for 1995: 
    10084846 words selected, number of files: 189
Running for 1996: 
    9928812 words selected, number of files: 172
Running for 1997: 


2020-03-30 10:49:42,123 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,141 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,223 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,241 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,329 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,348 - cu.gen - INFO - All sub-files counted in 0.0 sec.


    10050595 words selected, number of files: 167
Running for 1998: 
    9985679 words selected, number of files: 149
Running for 1999: 
    10047470 words selected, number of files: 139
Running for 2000: 


2020-03-30 10:49:42,441 - cu.gen - INFO - All sub-files counted in 0.1 sec.
2020-03-30 10:49:42,461 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,560 - cu.gen - INFO - All sub-files counted in 0.1 sec.
2020-03-30 10:49:42,579 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,680 - cu.gen - INFO - All sub-files counted in 0.1 sec.
2020-03-30 10:49:42,709 - cu.gen - INFO - All sub-files counted in 0.0 sec.


    10136842 words selected, number of files: 137
Running for 2001: 
    10101782 words selected, number of files: 117
Running for 2002: 
    10394575 words selected, number of files: 166
Running for 2003: 


2020-03-30 10:49:42,815 - cu.gen - INFO - All sub-files counted in 0.1 sec.
2020-03-30 10:49:42,841 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:49:42,953 - cu.gen - INFO - All sub-files counted in 0.1 sec.
2020-03-30 10:49:42,978 - cu.gen - INFO - All sub-files counted in 0.0 sec.


    9892138 words selected, number of files: 138
Running for 2004: 
    10060779 words selected, number of files: 118


In [12]:
file_name = '/home/projects/semmetrix/chilecto/model/diachronic/ml_10mperyear.fnames'
with open(file_name, 'w') as f:
    f.write('\n'.join(ml_list))

### TW

In [13]:
# set input
dir_in = '/home/projects/semmetrix/chilecto/corp/giga20181203/cna_cmn/'

source_list = [['dir', dir_in]]

In [14]:
counter = cu.CorpCounter(source_list, 'wpr')
counter.count_all_file_mp()

2020-03-30 10:49:51,819 - cu.gen - INFO - Read file list in folder: /home/semmetrix/chilecto/corp/giga20181203/cna_cmn/.
2020-03-30 10:55:34,422 - cu.gen - INFO - All files counted in 337.6 sec.
2020-03-30 10:55:35,021 - cu.gen - INFO - All sub-files counted in 0.6 sec.


532850395

In [15]:
tw_list = []

for year in year_list:
    print(f'Running for {year:d}: ')
    res = select_size_year(counter, year, num_word)
    print(f'    {res:d} words selected, number of files: {len(counter.sub_file_list):d}')
    tw_list.extend(counter.sub_file_list)
    

2020-03-30 10:55:35,083 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,087 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,136 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,140 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,196 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,202 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,264 - cu.gen - INFO - All sub-files counted in 0.0 sec.


Running for 1991: 
    9933008 words selected, number of files: 157
Running for 1992: 
    9910828 words selected, number of files: 134
Running for 1993: 
    10058324 words selected, number of files: 131
Running for 1994: 


2020-03-30 10:55:35,271 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,338 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,346 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,418 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,427 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,504 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,513 - cu.gen - INFO - All sub-files counted in 0.0 sec.


    10165254 words selected, number of files: 104
Running for 1995: 
    9911567 words selected, number of files: 93
Running for 1996: 
    9633146 words selected, number of files: 86
Running for 1997: 


2020-03-30 10:55:35,597 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,608 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,695 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,707 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:35,800 - cu.gen - INFO - All sub-files counted in 0.1 sec.
2020-03-30 10:55:35,812 - cu.gen - INFO - All sub-files counted in 0.0 sec.


    10194896 words selected, number of files: 84
Running for 1998: 
    9914175 words selected, number of files: 78
Running for 1999: 
    10364243 words selected, number of files: 78
Running for 2000: 
    10125873 words selected, number of files: 73
Running for 2001: 


2020-03-30 10:55:35,909 - cu.gen - INFO - All sub-files counted in 0.1 sec.
2020-03-30 10:55:35,921 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:36,025 - cu.gen - INFO - All sub-files counted in 0.1 sec.
2020-03-30 10:55:36,038 - cu.gen - INFO - All sub-files counted in 0.0 sec.
2020-03-30 10:55:36,142 - cu.gen - INFO - All sub-files counted in 0.1 sec.


    10331920 words selected, number of files: 64
Running for 2002: 
    10385998 words selected, number of files: 67
Running for 2003: 
Corpus does not have enough words
    8955572 words selected, number of files: 343
Running for 2004: 


2020-03-30 10:55:36,253 - cu.gen - INFO - All sub-files counted in 0.1 sec.
2020-03-30 10:55:36,278 - cu.gen - INFO - All sub-files counted in 0.0 sec.


    9636971 words selected, number of files: 120


In [16]:
file_name = '/home/projects/semmetrix/chilecto/model/diachronic/tw_10mperyear.fnames'
with open(file_name, 'w') as f:
    f.write('\n'.join(tw_list))