In [1]:
import numpy as np

Binary files are computer files that store data in a format that isn't human-readable. They contain raw binary data, which represents machine instructions or data structures.

Key characteristics of binary files:

1. Not human-readable: Binary files contain raw binary data that isn't easily understandable by humans.

2. Machine-specific: They are typically platform-dependent and may not work across different operating systems or hardware architectures.

3. Efficient storage: Binary files often store data more efficiently compared to text-based formats.

4. Various types: Examples include executable files (.exe), image files (.jpg, .png), audio files (.mp3), video files (.avi, .mov), and compressed files (.zip).

5. File extensions: Many binary file types have specific extensions that indicate their purpose or content.

6. Data representation: Binary files represent data using binary digits (bits) - either 0 or 1.

7. Compression: Some binary files may be compressed to reduce their size, like ZIP archives.

Binary files are commonly used in various applications and systems due to their efficient storage and transmission capabilities. However, they can be challenging to work with directly, especially for human users, as they don't contain readable text or structured data

#### Saving and Loading Binary Files
Binary files save data in a compressed, efficient format that preserves data types and structures, making them faster to load and save compared to text files

In [6]:
# np.save() & np.load(): save and load individual arrays in NumPy's binary format (.npy)

# Saving a single array
array = np.array([1, 2, 3, 4, 5])
np.save('array_data.npy', array)

# Loading the saved array
loaded_array = np.load('array_data.npy')
print(loaded_array) 


# np.savez(): save multiple arrays in a compressed .npz file

# Saving multiple arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
np.savez('arrays_data.npz', arr1=array1, arr2=array2)

# Loading multiple arrays
loaded_data = np.load('arrays_data.npz')
print(loaded_data['arr1']) 
print(loaded_data['arr2']) 

[1 2 3 4 5]
[1 2 3]
[4 5 6]


#### Text Fles
Text files are a common format for datasets, particularly .txt and .csv files. 

In [11]:
# np.savetxt(): save a 1D or 2D array to a text file, with customizable formatting

array = np.array([[1,2,3],[4,5,6]])
np.savetxt('array_data.txt', array, fmt='%d', delimiter=',')  # fmt: '%d' specifies the format for saving the numbers. In this case, it indicates that integers should be saved without decimal point.

# np.loadtxt(): load data from a text file into a NumPy array. can specify the delimeter and data type
loaded_array = np.loadtxt('array_data.txt', delimiter=',', dtype=int)
print(loaded_array)

# np.genfromtxt(): similar to loadtxt, but it can handle missing values and specify column data types
data = np.genfromtxt('array_data.txt', delimiter=',', dtype=int, filling_values=0)
print(data)

[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]


#### Handling CSV Files with np.genfromtxt()
for more structured data with headers or mixed data types, 

In [25]:
data = np.genfromtxt('username.csv', delimiter=';', names=True, dtype=None, encoding=None)
'''  
names: True indicates that first row of the CSV file contains columns headers
dtype: None means that NumPy will automatically determine the data type for each column based on the contents
encoding: None indicates that the default encoding will be used (typically 'bytes'), you might specify as 'utf-8'

'''   
print(data['Username'])

['booker12' 'grey07' 'johnson81' 'jenkins46' 'smith79']


#### Binary Data with Memory Mapping
When working with very large arrays that won’t fit into memory all at once, you can use memory-mapped files. This allows NumPy to read from disk as if it were in memory, loading only the parts you access

#### String Conversion
if need to covert between strings and arrays

In [33]:
# np.array2string(): converts an array to a string for display or saving in a text format

array = np.array([1,2,3])
string_data = np.array2string(array)
print(string_data)

# np.fromstring(): create an array from a string
string = '1 2 3 4 5'
array = np.fromstring(string, sep=' ')
print(array)

[1 2 3]
[1. 2. 3. 4. 5.]
