Skip to content

nikhilchakravartula/LARGE-SORT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Problem Statement: The task is to sort a large 10GB file using linux system calls and c library functions. The file is named as "file1.csv" and the metadata is as follows.

RollNumber, Name , Specialisation, ProjectInvolved

Points to be noted: I used external sorting to complete this. I created the file,then divided it into several temporary files each of 1MB, sorted them using heap-sort,finally merged all the sorted files using N-way merge.If due to some external power failure or shutdown, the process is terminated before completion, it continues from the point where it has left off the next time when it is run again, although there might be re-doing of some steps.

Unique roll numbers are created using Fisher-Yates shuffling algorithm and name of the student is chosen at random based on the roll number. Although the names seemed meaningless to me, i felt atleast someone in this world might have it :P .You can also change the file size that has to created by modifying FILE_SIZE macro in "create_file.h". It is suggested to take a look at "data.h". Care has to be taken while modifying MAX_FD macro in "data.h". It is the maximum open file descriptors a process can have in its running environment. It depends on the system on which the process is running. So,it is advised to increase or decrease it as needed, else get ready to brace for the impact of the program's crash-landing. The RUN_SIZE in "file_sort.h" cannot exceed 10^6(creates 977KB ,approx 1MB individual files). However, you can reduce it to 16 * 1024 for dividing the file into 16KB individual files, similarly 8 * 1024 for working with individual 8KB files.

This program may take a lot of time depending on the system's resources. On an 8GB ram with Intel-Xeon 2.40 GHz processor, it took 3 days flat to complete.If its not for system calls, the same task can be achieved in much lesser time. Many temporary files are created in the process to hold the current status of the sort. Make sure you have enough space on the disk , say probably 15GB. These temporary files are deleted after they are dealt with. So, all that remains after running the program is the sorted version of file1.csv.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages