This repository contains a project completed for GMIT's Theory of Algorithms module. It involves writing a program in the C programming language to perform the Secure Hash Algorithm (SHA) algorithm, specifically the 256-bit version known as SHA-256. The implementation is a single C file thats calculates the SHA-256 checksum of an input. The algorithm is based on the Secure Hash Standard document supplied by the National Institute of Standard and Technology.
This standard specifies hash algorithms that can be used to generate digests of messages. The digests are used to detect whether messages have been changed since the digests were generated. The table below presents the basic properties of the algorithm.
Algorithm | Message Size (bits) | Block Size (bits) | Word Size (bits) | Message Digest Size (bits) |
---|---|---|---|---|
SHA-256 | <264 | 512 | 42 | 256 |
Before SHA-256 can be performed on an input some preproccesing is required. The steps are outlined in Section 5 of the Processing Standards Publication 180-4.
- Padding the message (Section 5.1).
- Parse message into message blocks (Section 5.2).
- Set initial hash value (Section 5.3).
- Append "1" bit to the end.
- Add enough "0" bits so left with 64 bits at the end.
- In remaining 64 encode length(nobits) of message in binary big-endian.
- Read 64bytes at a time from file to our message block.
- If last fread < 56 bytes put all padding into last message block (add a "1" bit and 7 "0" bits).
- If we dont have enough bytes left at the end of the block:
- Create new message block.
- Only contains padding.
- If file was exactly 512 bits:
- Create another message block first bit is "1"
- Then "0" bits
- Then last 64 bits are the number of bits in original file.
Each Message block M(1), M(2),.... M(N) is processed in order using the steps defined below:
- Prepare message schedule W(t)
- Iniitialze the eight working variables a, b, c, d, e, f, g, h with their specified hash value
- For t=0 to 63 create new values for working variables.
- Compute the ith intermediate hash value H(i):
After repeating steps one to four N times the resulting 256-bit message digest of M is H(0)(N), H(1)(N),......H(7)(N).
The only requirement for this program is a C compiler. There is 2 ways to do so
- Install a C compiler on the specified machine.
- Use the online service available at onlinegdb to complie C file.
Clone this reposiory to your machine.
- Navigate to directory
- In command prompt
> git clone https://github.com/kbarry91/4th-year-SHA-256-Algorithm.git
Navigate to the downloaded repository and enter :
> gcc -o sha-256 sha-256.c
This will compile the program and add a sha-256
executable to the directory.
The program has been designed to work in 3 different ways:
- Enter file as a command line arguement.
- Enter the filename as a string at runtime.
- Enter a string to be hashed at runtime.
To hash a file from command line enter the executable and the file to be hashed.
> ./sha256 filename.txt
The algorithm has be designed to check if a file was entered as an arguement. If not you will be given the option to select 1 and enter the file name. Simply enter the path and filename. To run the program:
> ./sha256
The program allows for a user to enter a string to compute the checksum. Simply select option 2 at the main menu and enter the string. To run the program:
> ./sha256
The algorithm was tested using the test vectors approved by the National Institute Of Standards available at DI Management.Testing was verified on both Linux and Windows machines and returned the same results. In order to verify the results each checksum was compared with the results got from 2 other resources.
To run the following tests the corresponding test files have been added to the test-files folder. When testing via command line arguement or at runtime the files can be referenced by test-files/test1.txt
Input | Expected Result | Actual Result | PASS/FAIL |
---|---|---|---|
abc | ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad | ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad | PASS |
Input | Expected Result | Actual Result | PASS/FAIL |
---|---|---|---|
empty string "" | e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855 | e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855 | PASS |
Input | Expected Result | Actual Result | PASS/FAIL |
---|---|---|---|
abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq | 248d6a61 d20638b8 e5c02693 0c3e6039 a33ce459 64ff2167 f6ecedd4 19db06c1 | 248d6a61 d20638b8 e5c02693 0c3e6039 a33ce459 64ff2167 f6ecedd4 19db06c1 | PASS |
Input | Expected Result | Actual Result | PASS/FAIL |
---|---|---|---|
testing on file sha256.c | d10c44abceaca287a2fae35c70436b4ba30286baaac0ad044e3fcf9612799fe8 | f8b8ff1646d46d6cebf61110ccde701b0f076904d2dcf3a1177cd6c6d86f561c | FAIL see known bugs section |
A file must be succesfully opened in order to run the program. This is simply achieved by using fopen()
and a check for NULL
. If a file isn't provided as an argument the user can enter a file name and then that file will be checked.
FILE *file;
// Check if file was entered as cmd argument.
if (argv[1] == NULL)
{
printf("No file specified as argument.\nPlease enter a file name: ");
scanf("%s", fileName);
printf("Searching for %s.....\n",fileName);
file = fopen(fileName, "r");
}else{
file = fopen(argv[1], "r");
}
// Check if file opened succesfully.
if (file == NULL)
{
printf("[ERROR]: Could not open file.\n");
}
else
{
// Run Secure Hash Algorithim on the file.
printf("[FILE READ SUCCESS]: Now running sha256 Hash Computation.....\n");
sha256(file);
}
After successfully processing the SHA-256 of an input the checksum is saved to a new file in a folder called saved-hashes
. As the file entered may contain a file path and an extension, using the libary <libgen.h>
the path and extension are removed from the file. The new file name is appended with .txt and then appended to the path saved-hashes/ to save the file.
The program allows the user to enter a string to generate its checksum. This is done by saving the string to a file at test-files/userinput.txt"
. Once saved the sha 256 is calculated as normal.
Little and big endian are two ways of storing multibyte data-types ( int, float, etc). In little endian machines, last byte of binary representation of the multibyte data-type is stored first. On the other hand, in big endian machines, first byte of binary representation of the multibyte data-type is stored first.
To check if a machine uses big-endian or little-endian the following macro was used :
#define IS_BIG_ENDIAN (*(uint16_t *)"\0\xff" < 0x100)
The above code illustrates the comparison between an integer being compared against a cast character string integer.
#define CONVERT_UINT32(x) (((x) >> 24) | (((x)&0x00FF0000) >> 8) | (((x)&0x0000FF00) << 8) | ((x) << 24))
#define CONVERT_UINT64(x)
((((x) >> 56) & 0x00000000000000FF) | (((x) >> 40) & 0x000000000000FF00) |
(((x) >> 24) & 0x0000000000FF0000) | (((x) >> 8 ) & 0x00000000FF000000) |
(((x) << 8 ) & 0x000000FF00000000) | (((x) << 24) & 0x0000FF0000000000) |
(((x) << 40) & 0x00FF000000000000) | (((x) << 56) & 0xFF00000000000000))
The program appears to be invalid when tested on large files. After extensive research, I have narrowed the fault down to an issue with how line breaks are used to terminate lines in different operating systems such as a PC running Windows and a web server running Linux.
- Windows - Uses CR and LF characters to terminate lines.
- UNIX - Uses only a single LF character.
- MAC - Uses a single CR character.
This can also be caused by the text editor used to create the file, with some editors like Notepad++ the file is adjusted to suit the current operating system. The solution below suggested by hanselman has been implemented in .NET to deal with the issue when writing a file.
public static String NewLine {
get {
Contract.Ensures(Contract.Result() != null);
#if !PLATFORM_UNIX
return "\r\n";
#else
return "\n";
#endif // !PLATFORM_UNIX
}
}
To solve to this issue when reading from a file byte by byte I would check for a line ending and remove it, for example on Windows remove the 0D byte so the newline is only \n. Due to lack of time to extensively test the solution I did not get to implement the change, but the research involved to solve the issue will benefit to help me deal with this problem before it arises in future projects.
I feel that the code design could be improved to produce a much cleaner file. The main reason for this is extra features that I decided to implement at last minute. I have started practicing developing with an agile approach to get the products business requirements delivered at a faster pace and if more time was available I would conduct this code clean up on my next sprint.
To outline an example the code uses 2 methods for file reading. These methods could be redesigned to abstract what differs to allow one method to be used for both types of input and output.
int writeToFile(uint32_t hash[])
int writeToFileInput(char inputString[])
Although it works as it should I would also (given more time) propose to extract the menu operations from the main method, just to pretify the code.
In order to complete this project alot of research went into both the SHA-256 algorithm aswell as the C programming language. Any code adapted from outside sources is refernced in the sha256.c file and complies with all licenses and [^policies] .
Below is as list of some other resources used to conduct research:
- ch maj
- endian conversion
- endian check
- National Institute of Standard and Technology
- DI Management
- Binary representations
- SHA Standard
- String Manipulation
- OS line ending
[^policies] : This project complies with the Quality Assurance Framework at GMIT which includes the Code of Student Conduct and the Policy on Plagiarism.